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Message from the Conference General Chair 

Graphics Interface ’86 and Vision Interface ’86 are another 
ground-breaking endeavour of the Canadian Man-Computer 
Communications Society. We are particularly proud of the fact 
that the event is the 12th Canadian conference devoted to 
computer graphics and the longest running computer graphics 
conference in North America. 

This year we have accepted the Canadian Image 
Processing and Pattern Recognition Society as a partner. In the 
past, CIPPRS has participated in the program, but not as 
extensively nor as visibly. While the joint conference has 
certainly led to some coordination problems, it is hoped the 
synergism generated will be beneficial to all participants. 


It is also significant that this is the first time that Graphics 
Interface is being held in Vancouver. We welcome the 
Vancouver participants who have not had an opportunity to 
attend Gl Conferences in the past, and we hope for their 
continued support. 

Putting together a program of this size and scope is a 
considerable task, and I want to thank the organizers for all 
their work. In particular I would like to mention Mark Green for 
putting together the Graphics Interface program, Morris 
Goldberg and Bob Woodham for the Vision Interface program, 
Gunther Schrack for local arrangements, and Marceli Wein for 
the Proceedings. Without volunteer efforts such as theirs, the 
conference would not be nearly as enjoyable. 


Wayne A. Davis 
President CMCCS and 
Conference Chairman 


Message from the Chairman of the Local Organizing 
Committee 

A wide variety of events is taking place during Graphics 
Interface/Vision Interface ’86. There are two days of tutorials to 
introduce new and interesting topics to those wanting more 
background information; three days of technical sessions 
presenting recent ideas, concepts and results in computer 
graphics and computer vision; a film show of the latest 
applications of technology to the world of entertainment; and a 
banquet and two receptions to allow the opportunity for the 
personal contacts and conversations so important at a 
professional conference. 


Message du president de la conference 

Graphics Interface ’86 et Vision Interface ’86 constituent 
une autre activity novatrice de la Socfefe canadienne du 
dialogue homme-machine. Nous sommes tout particuli&rement 
fiers que cette douzfeme conference canadienne consacrde k 
Pimagerie informatique soit la plus ancienne conference dans le 
domains en Arrferique du Nord. 

Cette ann&e, I’Association canadienne de traitement 
d’images et reconnaissance des formes s’est jointe k nous 
pour (’organisation de la conference. L’ACTIRF a participe au 
programme des conferences anferieures, mais jamais de 
manfere aussi active ni aussi visible. Bien que le caracfere 
conjoint de la conference ait engendre certains probfemes de 
coordination, nous esp6rons que tous les participants 
beneficieront des effets positifs de la “synergie” resultants. 

De plus, il est important de souligner qu’il s’agit de la 
premiere fois que Graphics Interface a lieu k Vancouver. Nous 
souhaitons la bienvenue aux participants de Vancouver qui 
n’ont pas eu I’occassion d’assister aux conferences Gl 
anferieures et nous expgrons que leur participation se 
poursuivra a I’occasion des conferences futures. 

Lfetablissement d’un programme de cette envergure est 
une feche considerable et je tiens k remercier tous les 
organisateurs. Je veux mentionner en particulier Mark Green 
qui a coordonne le programme de Graphics Interface, Morris 
Goldberg et Bob Woodham qui ont coordonne le programme 
de Vision Interface, ainsi que Marceli Wein qui est responsable 
du compte rendu. Sans la participation tfenevole de ces gens et 
d’autres personnes, le fesultat final ne serait pas aussi 
satisfaisant. 


Wayne A. Davis 
President de la SCDHM et 
President de la conference 


Message du President du Comife d’organisation local 

Le programme de Graphics Interface/Vision Interface ’86 
offre une gamme tfes variee d’activites. Tout d’abord, deux 
joumfces de seances d’etude portant sur des domaines neufs 
et captivants k Pintention des personnes qui desirent s’y 
familiariser; ensuite, trois jours d’ateliers techniques traitant 
d’kfees, de concepts et de resultats recents dans les secteurs 
de Pimagerie informatique et de la visionique informatique; un 
programme de films montrant certaines applications recentes 
de ces technologies au domaine des arts du spectacle; et 
finalement, un banquet et deux receptions favorisant les 
conversations et contacts personnels, facette \rks importante 
d’une conference professionnelle. 
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And there is more! Vancouver is an inviting city, nestled 
between the mountains and the sea with plentiful fine 
restaurants, entertainment, breathtaking sights, and fascinating 
neighbourhoods such as Qastown, Chinatown and Granville 
Island, to name but a few. Of course there is Expo '86, an 
entire summer of entertainment and information, concentrating 
on transportation and communication of the past, present and 
future. I hope you can find time to take advantage of at least 
some of these attractions. 

A conference of this size depends entirely on the 
dedication of the volunteers who are organizing it. Much work 
has been invested, and I wish to express my sincere thanks to 
all my fellow committee members who gave much time and 
effort to make this conference possible. I hope you will benefit 
from it. 


Gunther Schrack 
University of British Columbia 


Message from the Program Chair Graphics Interface ’86 

Over the years Graphics Interface has developed a 
reputation for a varied and interesting program. It is my hope 
that the 1986 program will meet the standards set by previous 
Graphics Interface conferences. 

This year’s program is a mayor departure from those of 
previous years, in that there are fewer parallel sessions. The 
Program Committee had to be more selective in the review of 
papers. I would like to thank the committee members for the 
extra effort required to review the papers (most of which were 
full papers, as opposed to the extended summaries submitted 
in previous years). The trend towards fewer parallel sessions 
should increase the quality of the papers presented at Graphics 
Interface. 

Another major difference in 1986 is the inclusion of the 
Vision Interface conference. This new conference is being held 
in parallel with Graphics Interface, with sessions open to 
attendees of both conferences. Graphics Interface and Vision 
Interface will share several common sessions and this will 
encourage interaction between these two related fields. 


Mark Green 
University of Alberta 


Et ce n’est pas tout! Vancouver est une ville trds 
accueillante, sise entre la montagne et mer, et qui regorge 
d’excellents restaurants, de lieux de divertissement, de 
paysages k couper le souffle et de quarters pittoresques 
comme, par exemple, Gastown, Chinatown et Granville Island. 
Sans parler bien sur d’Expo ’86, qui offrira pendant tout I’efe 
un programme de divertissement et d'information, en particulier 
dans les domaines des transports et des telecommunications 
d’hier, d’aujourd’hui et de demain. Je vous souhaite d’avoir le 
temps de profiter d’au moins une parte de ces attractions. 

Une conference de cette envergure repose enticement 
sur le travail des volontaires qui I’organisent. La somme de 
travail n&cessaire a efe enorme et je veux remercier tous les 
autres membres du comite d’organisation qui ont consacre 
beaucoup de temps et d’energie k cette conference; sans eux, 
le tenue de la conference aurait efe impossible. J’espCe que la 
conference vous sera profitable. 


Gunther Schrack 
University of British Columbia 


Message du president du programme de Graphics 
Interface *86 

Depuis ses cfebuts, Graphics Interface a acquis la 
reputation d’offrir un programme varie et inferessant. J’ose 
espCer que le programme de 1986 satisfera aux criteres 
exigeants etablis tors des conferences Graphics Interface 
precedentes. 

Le programme de cette anne© se distingue nettement de 
celui des annees precedentes puisqu’il comporte moins 
d’activifes concomitantes. Le Comite du programme a du 
etudier les communications proposees avec plus de rigueur. Je 
remercie les membres de ce comite pour le surcrolt de travail 
qu’a exige la lecture des textes (dont la plupart etaient des 
articles complets plutot que des resumes detailfes comme les 
annees precedentes). La diminution du nombre d’activifes en 
paralfele devrait accoltre la qualife d’ensemble des 
communications presentees tors de Graphics Interface. 

L’autre nouvel aspect tres important en 1986 est 
I’adjonction de la conference Vision Interface. Cette nouvelle 
conference se tient en paralfele k Graphics Interface et toutes 
les activifes sont ouvertes aux membres des deux 
conferences. Plusieurs ateliers seront communs a Graphics 
Interface et Vision Interface, ce qui devrait favoriser les 
echanges entre ces deux domaines connexes. 


Mark Green 
University of Alberta 
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Message from the Program Co-chairs of Vision Interface 

Welcome to Vision Interface ’86, sponsored by the 
Canadian Image Processing and Pattern Recognition Society 
(CIPPRS). It is a great pleasure to begin a new series of vision 
conferences in conjunction with Graphics Interface ’86, the 
12th graphics conference sponsored by the Canadian Man- 
Computer Communications Society (CMCCS). Vision 
Interface ’86 has relied heavily on the organizational work of 
the Graphics Interface ’86 Planning Committee, for which we 
are most grateful. Hopefully, all attendees benefit from the joint 
tutorial program and technical sessions of the companion 
conferences. As Program Co-chairs of Vision Interface ’86, we 
thank our invited speakers and all those who submitted papers. 
These proceedings become the permanent record of your 
efforts. 


M. Goldberg 

University of Ottawa 

and 

R.J. Woodham 

University of British Columbia 


Meaeage dea co-presidents du programme de Vision 
Interface 

Nous vous souhaitons la bienvenue A Vision Interface '86, 
que commandite (’Association canadienne de traitement 
d’images et reconnaissance des formes (ACTIRF). C’est avec 
un vif plaisir que nous inaugurons une nouvelle serie de 
conferences sur la visionique, en collaboration avec Graphics 
Interface '86, la douzi&me edition de cette conference 
commanditee par la Societe canadienne du dialogue homme- 
machine (SCDHM). Vision Interface '86 a beaucoup profit© de 
(’experience du comite d’organisation de Graphics 
Interface '86; nous lui exprimons toute notre reconnaissance. 
Nous esperons que tous les participants profiteront du 
programme commun de seances d’etudes et d’ateliers 
techniques des deux conferences. A titre de co-presidents de 
Vision Interface ’86, nous remercions les conferences invites 
ainsi que tous ceux qui nous ont soumis des articles. Le 
present compte rendu forme un temoignage permanent de vos 
travaux. 


M. Goldberg 

Universite d’Ottawa 

et 

R.J. Woodham 

University of British Columbia 
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Abstract 

This paper examines the role of computer graphics 
and new media technologies in the fashion industry. 
The three phases of computer graphics application 
are: 

(1) Design/ Visualization, 

(2) Pattern Creation/Manufacturing , and 

(3) Presentation/Promotion. 

This paper focuses on the third application, illustrat¬ 
ing some of the new directions in computer graphics 
and fashion with examples of the author’s work in 
fashion videos and interactive systems. 

KEYWORDS: fashion design, pattern CAD/CAM, 
stereoscopic, fashion video. 

Introduction 

Computer graphics has made its impact on most 
fields which entail design and imaging: movie and 
television special effects, computer-aided design and 
manufacturing, computer-aided engineering, molecular 
design, medical imaging, seismic analysis for oil pros¬ 
pecting, as well as on video-games and flight simula¬ 
tors. One design field for which the assimilation and 
creative use of computer graphics still poses a chal¬ 
lenge is the fashion industry. Computer imaging sys¬ 
tems are being applied in three areas: Design , 
Manufacturing , and Presentation. 

The fashion industry is built around a process of 
producing a new look. The fashion business is an 
unending visual modification of the definition of 
society and its image. Ironically, its link to one of the 
newest image making media is not implicit. The 
motivation for using computer graphics systems is 
clearly defined only in the manufacturing phase: to 
obtain the advantages of an automated system. In 
the design and presentation phases, it is not yet evi¬ 
dent that the computer medium can enhance produc¬ 
tion and creativity. In these areas, traditional 
methods are neither easily simulated nor improved. 
This paper gives an overview of some applications of 
new media technology in the fashion industry. An 
emphasis is given to the final phase; examples from 
the author’s work are included to demonstrate how 
computer graphics animations can generate an 
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immediacy and capture the imagination in a way 
which is true to fashion. 

Design / Visualization 

The computer is a natural medium for the mass 
production of perspective images; this makes it an 
effective tool for engineering and architectural design. 
Garments however are characterized by neither rigid 
surfaces nor simple geometrical construction. 

To accurately model a garment requires a data 
format that can represent flexible materials and the 
effects of physical forces such as gravity and surface 
tension. Hierarchical solid modeling operations are 
one technique for simulating twisting, bending, taper¬ 
ing and other such transformations of objects. 
“Deformations” [1], a form developed by Alan Barr, 
can be used to simulate flexible geometric objects 
made of fabric. This technique obtains a normal vec¬ 
tor of an arbitrarily deformed smooth surface that can 
be calculated directly from the surface normal vector 
of the undeformed surface and a transformation 
matrix. The deformations are combined in a hierarch¬ 
ical structure. 

Furthermore, from the level of detail of fiber and 
fur to a pattern on a fabric, modeling a garment 
requires sophisticated techniques such as texture syn¬ 
thesis and stochastic modeling [2]. Patterns on flat 
fabric can be texture mapped onto the curved surface 
of the constructed garment [3]. Also, methods are 
being created to warp the surface of an analytically 
defined object [4]. 

The jacket design for a computer generated char¬ 
acter, User Abuser , demonstrates a method developed 
at the Computer Graphics Lab, NYIT, for creating a 
realistic three-dimensional model of a garment. The 
jacket was modeled as a non-closed surface defined by 
polygons in a mesh format. This format consists of a 
list of 3d vertex coordinates, with normals and texture 
coordinates, topologically connected into a grid with a 
certain number of rows and columns [5]. 

One of its most important features is flexible 
joints at the shoulders and elbows, modeled with flex 
software by Richard Lundin [6]. These are necessary 
to correctly deform the surface when rotations are 
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performed at a joint in a tree structure of a three- 
dimensional model. Each flexible joint is composed of 
a single polygonal surface in mesh format which 
changes shape as the joint is flexed; the joint begins at 
a particular body part in the model tree and ends at 
another body part. A bezier spline curve, which 
defines the centerline of the flexible joint, is created 
from control points which include the end points of 
the flexible joint and the coordinates of any joint 
nodes between the end points. The flex software 
determines the joint axis, calculates the two end coor¬ 
dinates of the joint, and finally distributes and orients 
the mesh rows, with respect to the centerline of the 
flexible joint. This allows the arms to be freely 
moved, while leaving the garment’s seams intact! 

The jacket can be rendered with any texture and 
color and displayed from any point of view [Figure 1]. 
The appearance of the jacket also depends on lighting 
which, in this case, creates the jacket’s specular sur¬ 
face. 

From the vantage of the computer graphics 
industry, the simulation of three-dimensional gar¬ 
ments is a technical problem which will probably gain 
more attention. In part, this is because the efforts to 
model and animate the human figure are achieving 
more and more success. Obviously, this will converge 
upon the next challenge: designing and animating 
clothes for this figure. 

Without a practical means for representing 
three-dimensional garments, computer graphics design 
systems in the industry have been geared towards 
manipulating two-dimensional designs. The thrust of 
these systems is to provide (1) an efficient means for 
visualization and (2) an optimal way to create a 
design to be readily used in a CAM system. 

Textile design is an area related to the fashion 
industry where computer graphics are widely 
employed as a design tool. A separate activity from 
garment construction, textile design requires only 
two-dimensional representations. Weaving and textile 
design have long been familiar to computer graphics. 




Figure 1 Suit jacket with texture map. 
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In the early 19th century, Joseph Jacquard developed 
punched cards as a means to control weaving 
machines in textile mills. Charles Babbage used these 
as a model for the cards he used to control the 
sequencing operations of his calculating machine [7]. 
Now, there are computer-interfaced looms which can 
control the pattern and handle complex designs as 
easily as simple ones. Computer imaging systems can 
analyze and visualize weave structures. Designs can 
quickly be scaled, repeated, recolored, and modified 
without having to laboriously redraw them by hand. 
The new design can then be produced using 
computer-interfaced looms. 

For example, a textile designer can create textiles 
with a microcomputer by means of the AVL Loom. 
Textile designs can be generated in full color with the 
AVL software automatically calculating an accurate 
representation of the warp and weft. Via an RS232 
connection, the AVL then can control the pattern as 
the loom is actually weaving it off, draw the pattern 
with a color plotter, or produce a black and white 
printout. 

The direction of the industry has been to tighten 
the link between design and manufacturing phases. In 
part, this is because the most efficient way to give 
pattern data to a CAM system is to create the gar¬ 
ment on a compatible design system. Thus, systems 
dedicated to the design phase have tended to evolve 
closely with those for manufacturing. 

A system for designing textiles and previewing 
them on garments has been developed by Computer 
Design, Inc (CDI). Compatible with traditional design 
methods, this system is a tool for rapidly visualizing 
design variations without extensive sample and sketch 
making. The data describing designs can subse¬ 
quently be manipulated with a variety of CAM sys¬ 
tems. The menu driven system uses an Iris Worksta¬ 
tion, from Silicon Graphics, Inc., which has a 68000- 
based processor, Geometry Engine, and raster 
subsystem; this system has 1024 x 1024 resolution and 
24 bits of color. A more powerful version uses the Iris 
Turbo which has a 68020-based processor. Fabric 
designs can be created on the system with its paint 
program with an electronic tablet or mouse. The sys¬ 
tem can calculate the number of threads, warp and 
weft for a given weave (such as twill or oxford) and 
yarn size. Characteristics such as color, scale, and 
repeats can be readily manipulated. Fabrics can then 
be mapped onto garment patterns. 

One of the system’s features is that designs from 
other sources, such as samples and sketches, can be 
scanned in with a camera and then manipulated by 
the paint system with elaborate functions. Using the 
paint system and mapping functions, a design from 
any source can be viewed instantly in a repertoire of 
colors, patterns or textures. 

Although the system works only with two- 
dimensional images, its functions preserve the dimen¬ 
sional effect created by textures or shadows in a 
drawn or photographic image of a garment. When 
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the garment is specified in a new color, a transparency 
function preserves these visual cues which suggest 
folds and curves. The texture mapping also takes into 
account light intensities so that the fabric pattern is 
realistically mapped onto the garment. 

In many respects, the role of computer graphics 
in the fashion industry’s design and presentation 
phases is similar to its role in movies, television, and 
advertising. Its use in these areas is not always cost- 
effective. For lines of clothes which are not mass- 
produced or mass-marketed, using computer graphics 
for design alone may be too expensive. Furthermore, 
cost alone may not be an adequate motive for a 
designer to make use of the technique. Thus, the role 
of computer graphics in creating designs in many 
ways remains subject to the public’s demands for 
aesthetic effects. 

In many ways, haute couture has an image which 
is antithetical to automation and thus does not seem 
likely to embrace technology in any highly visible 
phase of its production. It is in ready-to-wear fashion, 
an industry directed to the “modern” woman, that 
computer graphics may prove to be in demand. There 
are some instances of designers who actually reflect 
the image of computer technology in their work rather 
than using it as a hidden process behind it. The 
designer, Jurgen Lehl has used computer chips and 
highly pixellated images as patterns in his individual¬ 
istic and symbolic textile designs [8]. The fashion 
designer, Elisabeth de Senneville bases her line on the 
theme of new images. Dresses, sweat shirts, sweaters 
often have prints with computer and video images. 
She explains that when she designs a garment she 
always asks herself if it could be worn in the year 
2001. 

Pattern Creation / Manufacturing 

In the fashion industry, the adaptation of com¬ 
puter imaging techniques in the area of manufacturing 
is more prevalent than in its other phases. The 
overall goal is to automate and manage quality con¬ 
trol in all phases of design and production. Computer 
graphics appear in the form of CAD/CAM systems for 
digitizing, grading, marking, and sizing patterns in 
two dimensions. These are formulaic and mechanical 
operations which do not rely on creative decisions. 

Systems are designed to depend on expertise in 
pattern marking or marker grading — not computer 
knowledge. The Lectra System CAD/CAM systems 
give an example of a highly modular .system in which 
each unit is dedicated to a particular subset of the 
traditional tasks. Each component has 68000-based 
processor and disc storage that can range from 256 to 
1024 kbytes. Pattern pieces are input on a digitizing 
table. Next, the pieces can be manipulated — graded, 
sized, and markers made — in real time on a color vec¬ 
tor graphic display workstation with a pen and tablet. 
The pieces can also be “dragged” around and 
magnified on the display in real time. The plotter 
unit is used for either single-ply laser cutting or draw- 
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ing markers on spread fabric. A Winchester-based 
module with a networking controller provides extra 
storage. Modules can communicate between different 
locations over telephone lines by means of a modem. 

The next step in manufacturing is a system 
which can model three-dimensional garments and then 
unfold them into two-dimensional patterns for subse¬ 
quent manipulation on a CAM systems. GGT is 
developing a three-dimensional modeling system that 
can display rigid models of garments in high resolu¬ 
tion (1280 x 1084), using z-buffer algorithms for hid¬ 
den surfaces, and Gouraud shading. While it may be 
expensive — in computer time alone — to accurately 
model many of a garment’s physical characteristics 
(such as draping), their impact on the geometric lay¬ 
out must be evaluated and accounted for in the pat¬ 
tern. GGT is currently researching how to reflect 
engineering concerns including the effects of gravity 
and warping of a flexible surface when translating the 
model from three dimensions to a two-dimensional 
pattern. 

CDI is marketing a system for making shoe 
designs and patterns which is compatible with its 
design system described earlier. Shoes are character¬ 
ized by having rigid forms which closely fit the model. 
Thus the problems posed by flexible surfaces are 
minimal. A shoe form which is input on a three- 
dimensional digitizer can be manipulated as a three- 
dimensional, smooth shaded image. The design is 
then unwrapped from which point it can go on to 
standard pattern engineering and CAM systems. 

CDI’s garment system, under development, also 
begins with a digitized mannequin. The image can be 
displayed from different points of view in four win¬ 
dows on the color monitor. The designer draws style 
lines which the system accurately maps to the surface; 
changes are updated in all windows simultaneously 
[Figure 2]. Mirroring functions can automatically 
reflect designs. 


The problems are more complicated for the 
design of garments whose fabrics and construction do 
not adhere to the mannequin’s shape. At present, 



Figure 2 Smoothshaded three-dimensional image of 
digitized mannequin with pattern lines. 
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CDI is not attempting complete physical simulation of 
garments under the assumption that the level of detail 
may have applications in engineering but is not abso¬ 
lutely necessary in garment industry. 

Presentation / Promotion 

The technology for new ways of shopping over¬ 
laps with the technology for new ways of entertaining, 
promoting, educating, and informing. In many ways, 
much as it is in the design phase: it is a tool for visu¬ 
ally exploring a database of images, editing and mak¬ 
ing final selections. 

The cosmetics industry is one sector of the 
fashion industry which explores this. New technologies 
and cosmetics have a certain compatibility. “Science” 
is perceived as the means for attaining a variety of 
technological “miracles” that range from anti-aging 
products to long-lasting lipsticks. Most people wel¬ 
come the association of technology with cosmetics 
because they perceive it as being particularly accurate 
and scientific. In the past year, “makeup computers” 
have been introduced by Elizabeth Arden and two 
Japanese companies, Shiseido and Intelligent Skincare 

(is.)* 

These systems offer two kinds of services: com¬ 
puterized techniques for (1) makeup application and 
(2) skin analysis. Basically, the makeup systems use a 
standard computer graphics technology - a paint sys¬ 
tem — as an innovative way to present the makeup. 

All of these companies have systems which pro¬ 
vide environments where a trained makeup artist can 
simulate the application of “electronic makeup” to 
the customer’s face. For instance, Elizabeth Arden’s 
Single Unit System called Sue (refined from its origi¬ 
nal 3-station system, Elizabeth) houses all of its com¬ 
ponents in a tall unit which is similar in appearance 
to a commercial video game. The customer’s face is 
instantly scanned in, by means of a standard video 
camera, and digitized. The screen’s 512 by 512 raster 
image is divided into four quadrants: one for display¬ 
ing the initial image of the customer’s face and the 
others, for three different makeup applications. A 
menu provides palettes for blending suggested colors, 
selecting brush type and stroke firmness. Software 
can magnify the face to focus on details. A tran¬ 
sparency function maintains the face’s original texture 
while makeup is applied. At the end of the session, a 
print-out with suggested makeups scrolls out from a 
slot on the unit’s side. 

The systems cost each company about $1 million 
to develop and thus may not be offered by every 
cosmetic company. They have had an enthusiastic 
response. Shiseido, for instance, quintupled its sales, 
when it introduced its system at Bloomingdales in 
New York City, in the autumn of 1984 [9]. 

The appeal of these computer graphics systems is 
that they allow the customer to interact with the pro¬ 
duct by means of a simulation. Elizabeth Arden’s 
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marketing department has observed that potential 
customers are often reluctant to experiment or invest 
in a new product. With electronic makeup, a custo¬ 
mer can experiment not only without being 
threatened, but without removing the makeup being 
worn. 

The skin analysis system demonstrates the 
importance of having a human tending the interface 
between “science and beauty”. Based on a personal 
computer, Arden’s skin analyzer scans in the 
customer’s skin with a black and white video camera. 
The computer constructs a three-dimensional image of 
the skin surface texture from light intensity informa¬ 
tion. The data is normalized with additional informa¬ 
tion such as the customer’s age and skin type. Origi¬ 
nally, the system assigned a numerical value to the 
skin type and recommended specific products using a 
logic system with a hierarchical database of 94 pro¬ 
ducts and regimes. However, an uninterpreted numer¬ 
ical skin rating can alienate a potential customer. 
The computer thus works as an assistant to the con¬ 
sultant by performing the skin analysis and recom¬ 
mending an overall product line. 

The notion of using simulations with which a 
customer can interact has been explored in the area of 
fashion as well. The Magic Mirror, designed in Paris 
by Jean-Claude Bourdier, is an interactive system 
which uses simulation to create the visual effect for a 
customer that he is “trying on” clothes. The system 
wittily updates the traditional interface between cus¬ 
tomer and garment: the mirror. Set in an environ¬ 
ment that is like a darkened dressing room, the Magic 
Mirror optically combines the reflection of the 
customer’s face with the reflection of a slide projection 
of an clothing outfit. In the L.S. Ayres store in Indi¬ 
anapolis and Galerie Lafayette in Paris, a trained 
operator uses a computerized focusing system with a 
pushbutton interface to project the slide at a scale 
that fits the customer’s figure (size and height). In 
stores in Japan, the customer simply operates the 
Magic Mirror himself. Although the system is pri¬ 
marily based on “low” technology, it is easy to ima¬ 
gine the visual database updated so that it is accessed 
from a videodisc or computer generated. 

The video image and fashion are not a novel 
combination. Often in the form of “news magazines,” 
fashion reportage typically consists of interviews inter¬ 
cut with videotapes of fashion shows which concen¬ 
trate on good photography of runway presentations. 
Basically, these presentations are documentaries which 
often have a canned look. Videotapes are frequently 
running in various departments of stores. And of 
course, television broadcasts a barrage of commercials 
for cosmetics, perfumes and blue jeans. By adding 
computer graphics and interaction, the customer can 
gain a fresh look at the product. 

The Fizzazz store, designed by Music Video Pro¬ 
ductions, uses interactive computer graphics systems 
to present Murjani International’s line of CocaCola 
clothes. The centerpiece of the store is a pair of telev- 
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ision displays on which a customer can browse 
through images of the clothes. Each system appears to 
be simply a television, sheathed in a pared down 
chrome housing which is mounted on a slim column. 
Transparent touch sensitive plasma screens cover each 
of the screens. A computer generated graphical menu 
is composited with the video images; special hardware 
was designed by Sony to composite the videodisc 
(ntsc) and computer graphics menu (rgb). The inter¬ 
face is direct: no keyboards, cursors, mouses or com¬ 
puter languages. The hardware is placed behind the 
glass windows of CocaCola coolers, off near the dress¬ 
ing rooms. 

Customers call up pictures of the clothes which 
are stored on videodisc by touching menu items on 
the screen. The images are drawings of the garments 
depicted in various colors and from different points of 
view (front and back) and at various level of detail 
(pocket, cuff, sleeve, etc.). They can be viewed by 
touching the appropriate selection on the screen. The 
presentation is geared for a young market which is 
already used to browsing television for visual stimuli 
and fashion ideas. Multiple video projection systems 
display computer graphics animations and stills from 
the videodiscs on a wall. Visible from the street, the 
video systems run twenty-four hours a day and the 
animations are changed weekly. Murjani is gearing 
the store towards twenty-four hour shopping where 
the customer electronically looks through the clothes, 
purchases by means of a credit card, and has the 
goods delivered the next day. The designers of the 
system note that it is important that the clothes are 
available in the store to be handled and tried on. 

Like the makeup systems, fashion presentation 
systems introduce customers to a broader “database” 
of products than they might otherwise consider — and 
give them a tool to manage it. These systems put to 
use the same shopping techniques exercised without 
technology: collecting and editing data with the goal 
of making a purchase. 

Developing the notion that interactive computer 
graphics can provide a medium that can evoke the 
“presence” of the garment, the author designed a sys¬ 
tem (at the Architecture Machine Group, Mas¬ 
sachusetts Institute of Technology) where a joystick 
can be used to interactively rotate a garment. This 
was accomplished by photographing a coat [10] from a 
series of points of view. The photographs were digi¬ 
tized and laid out in a grid as one picture. This was 
then displayed in an AED framebuffer which could 
pan to and zoom up the appropriate image in real 
time as the viewer moves a joystick to indicate the 
direction from which he wants to view the garment. 

In another project, the same database of multiple 
images could be manipulated interactively in a 
stereoscopic computer graphics workstation [11, 12]. 
The workspace combines a stereoscopic video display 
with a real space by means of a semi-transparent mir¬ 
ror. Images were placed and moved in the space using 
an electromagnetic 6 degree of freedom digitizer 



mounted on a small “wand” with a pushbutton 
mounted in the handle [Figure 3]. The viewer could 
browse through different views of the garment which 
he had placed in the workspace [Figure 4]. 

When the modeling techniques and adequately 
powerful technology become available, such systems 
can also be imagined as a tool for the designer to pre¬ 
view a design. Rather that working with a database 
of digitized pictures, the designer creates a three- 
dimensional model of a garment which can then be 
displayed in the workspace from any point of view, at 
various levels of detail, and with colors and textures 
defined on the fly. 



Figure 4 Coat displayed from different viewpoints in 
workspace. Both left and right images of stereoscopic 
pair are visible. 
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One shortcoming of a video image of a garment is 
that a person cannot touch the fabric and have direct 
exposure to its colors and its quality. Computer 
graphics must be able to create visual effects that 
evoke the immediacy of a garment. Fashion videos 
which use computer animation may be one such 
means. Unlike a video recording of a fashion show 
which can seem diminished on tape, animations can 
be used to show fashion on the move. It can be seen 
as an effective extension to the controlled illusion 
created by fashion photography. “Attitude”, which in 
many ways is the essence of any particular style, can 
be evoked in an animation or using computer pro¬ 
cessed photographs intercut with live action. 

In a fashion video produced by the author at 
Computer Graphics Lab, NYIT, still photographs are 
processed and animated. An experimental video was 
used to explore relatively low-cost techniques for using 
the computer to animate fashion images. The data¬ 
base of images was just two still photographs of 
models in clothes by the design company Body Map 
and “soft fonts” [13]. The two images were digitized 
and processed with Ikonas fraitie buffer software 
developed at the Computer Graphics Lab. Pictures 
were then zoomed up, sharpened and blurred with low 
and high pass filters, and recolored by merging 
different colormaps [see Figures 5-6]. Mattes were 
produced from the pictures and used for selectively 
using parts of images and putting the images against 
different backgrounds. The mattes were also used for 
animated shadows. The type was seven letters from 
an alphabet of two-bit Clarendon soft fonts which 
were zoomed up, blurred, recolored and warped. 

UNIX shell scripts [14] were used to define the 
animation sequences. Each movement was specified 
as an interpolation between a beginning and end posi¬ 
tion over a given number of frames. Interpolation was 
determined by linear, ease in and ease out functions. 

Many of the techniques are similar to those used 
to optically process photographs. The computer offers 
the advantage of special filters, enhanced color mani¬ 
pulation with a range of 24 bits of color, control of 
level of detail (such as magnifying accessories and 
fabric patterns), changing text, and most important, 
movement. Many standard computer special effects 
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— such as gleams and glitters, highly reflective sur¬ 
faces, or wireframes — are not necessarily appropriate 
to the style conscious subject matter. Other popular 
techniques such as fly-through camera moves, that 
rely on creating three-dimensional models, are not 
inherently the best way to present the subject matter 
and can be cost-prohibitive. 

The control which the computer gives to the 
video image is suitable for making an emphatic state¬ 
ment. Rather than falling short of real life, the effect 
of a computer animation can be thought of as a 
caprice on real life — much as fashion is. 
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1. Introduction . 

The interlacement structure exhibited by a woven textile 
has traditionally, and conveniently, been represented by a 
binary array. Graphically, this has taken the form of a cartesian 
grid with cells coloured either black or white [Figure 1]. This 
binary structural array can, in fact be considered as the product 
of three matrix factors. These factors are normally also binary 
matrices and, in most cases, their product arises as a result of 
conventional matrix multiplication [1]. The factorization 
process is of great practical interest, since these matrix factors 


correspond to parameters for the production of the 
corresponding structure [2], as well as being of considerable 
theoretical interest The development of fast efficient factoring 
algorithms has been considered and has resulted in the 
processes described in [3,4]. 

In analysing an interlacement structure in this way, the 
data structure is first created using interactive graphical input to 
colour the cells of the corresponding grid [5,6]. When the data 
structure is complete, the factorization algorithm is invoked and 




FIGURE 1 
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tinea posteriori matrix factors computed The graphical display 
is then updated to include their graphical display [Figure 2]. 
The main advantage to this approach is that data design updates 
are rapid, since no analysis-processing takes place at the time of 
design creation. This was of particular importance in previous 
implementations on small low speed microprocessors, where an 
emphasis was placed on the design environment [7]. However, 
a serious disadvantage is that, as the structural array is 
modified, there is no continuous feedback as to the structural 
implications of design modifications on the corresponding 
matrix factors. In this paper, algorithms for performing 
continuous dynamic factorization in response to incremental 
data modifications, are considered, along with the ensuing 
implications for the corresponding graphical display. 

2. Dynamic Analysis Model. 

The analysis process can be considered to consist of 
three distinct phases, corresponding to each of the three matrix 
factors. The first phase requires that all of the columns of the 
structure array be sorted into distinctness classes [3 ] . Each 
distinct column is allocated a separate row in the threading 
matrix (top left in Figure 2), and all identical columns of the 
structure array have the single non-zero element in the 
corresponding columns of the threading array in the same row. 
The second phase of the algorithm requires that the rows of the 
structure array be sorted in the same manner, and that each 
distinct row be allocated a separate column of the shed sequence 
matrix (bottom right in Figure 2). The third phase involves 
determining a mapping between the distinct columns and rows 
of the structure array. This information is recorded in the tie-up 
matrix (top right in Figure 2), and can normally be generated 
simply as the intersection of one representative of each distinct 
column with one representative of each distinct row, without 
further processing. 
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The expensive part of this algorithm is the bit-wise 
comparison of all the columns of the structure array. Although 
the rows also require a bit-wise comparison, having determined 
the distinct columns of the array, only one representative of 
each distinct column set need be considered and the number of 
elements in each row is greatly reduced. The sorting process 
can be represented graphically, very effectively, using a digital 
trie [7] for the columns and for the rows. 

In constructing the array in Figure 1, initially all of the 
cells are coloured white, all of the columns are in a single node 
of the threading trie> and all of the rows are in a single node of 
the shed sequence trie. The single node, a leaf in this case, of 
the threading trie contains all of the column indices along with 
the binary sequence {00000000}, while the single node of the 
shed sequence trie contains all of the row indices along with the 
binary sequence {0}. As each change is made to the data 
structure, new branches are added, between the appropriate 
existing leaf and the new point of disagreement between two 
previously identical columns. The leaf column indices and 
binary sequences are also updated. After updating the threading 
trie , the shed sequence trie is modified. At each stage, the 
binary sequences contained in the leaves of the shed sequence 
trie correspond to the columns of the tie-up matrix. The digital 
tries corresponding to the structure array of Figure 1 are given 
in Figures 3 and 4. 

It can be noted that the traditional binary tree graphical 
representation consumes a great deal of space, particularly as 
sequences become more complex. An alternative method of 
display is given by a combed trie representation, where all 
’zero branches' are vertical and where 'one branches' appear to 
the right of any 'zero branches' at the same depth. Figures 5 
and 6 show a combed trie representation of the structure array 
of Figure 1. 

This data structure provides a means of rapidly updating 
the matrix factors corresponding to a given structure array in 
direct response to modifications to that array. It also supports 
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FIGURE 3 


an excellent graphical representation of the analysis process 
which can, in fact, be monitored to determine where changes to 
the algorithm structure or implementation will result in 
increased efficiency [8,9]. 

3. Some Implications for the User Interface. 

By using the digital trie data structure, the binary 
structure array and its corresponding three matrix factors can 
always be self-consistent. In addition, the rank of the factors 
can always be minimal and thus, in some sense, optimal. This 


pre-suppc*es, however, that the user will only update the 
structure array and will never wish to modify any of the three 
factors. This is definitely not the case. One of the tremendous 
utilities of such an interactive graphical system is the ability to 
modify any one of the four components and receive immediate 
visual feedback as to the implications of this action. This 
requires that a hierarchy of data elements be maintained by the 
software, with user-defined cells taking precedence over those 
set as a consequence of the analysis algorithm. This means that 
user-defined elements are not moved if, at some stage in the 
factorization process, their corresponding structure array 



FIGURE 4 
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FIGURE 5 


column or row becomes identical to another one. While this 
results in the factors not necessarily being of minimal rank, it 
does cause the user interface to conform to the design principle 
of predictability [10]. 

The consequences of this differential treatment of the 
user defined components of the partially determined factors, 
develops in complexity as the design process continues, and of 
course ultimately begins to limit the choices of the user. 
Contending with the accruing weight of these limitations places 



FIGURE 6 
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an increasing burden on the interactive response capabilities of 
the system and much further work needs to be done to resolve 
these problems. 


References . 

1. J.A. Hoskins and W.D. Hoskins, 

S truct ^ 

Analysis Qf WoymJEabriCS. Ars Combinatoria II 
(1981), 51-59. 

2. J.A. Hoskins and W.D. Hoskins, Using a 


Textiles . 1983 ACM Conference on Personal and Small 
Computer, San Diego, California, 1983. 

3. J.A. Hoskins and W.D. Hoskins, A .Fastff A l gorithm 
for Factoring Binary Matrices. Ars Combinatoria 16B 
(1983), 341 - 350. 

4. Janet Anne Hoskins, feoicmaLAmy s ...an d Te xtile 
Computer Graphics. Ph.D. thesis, 1985, University of 
Manitoba. 

5. J.A. Hoskins and M.W. King, Inigrag.tiYC. Pg sigO of 
Woven Textiles . Proceedings of the International 
Computer Color Graphics Conference, Tallahassee, 
Florida, 1983. 

6. J.A. Hoskins and M.W. King, AnJntefflctiysJQatabase 
for Woven Textile-Desig n. Textile Institute Annual 
Conference "Computers in the World of Textiles", 

Hong Kong, 1984. 

7. Alfred V. Aho, John E. Hopcroft and Jeffrey D. 
unman, Rata -SttVgtW£S..and Algorithms, (Reading, 
Mass.: Addison-Wesley, 1983) 

8. Marc H. Brown, A ...System foiLA l gori th m A ni mation , 
Proceedings of the SIGGRAPH '84 Conference, 
Minneapolis, 1984, 177-186. 

9. Gretchen P. Brown, Christopher F. Herot and David A. 
Kramlich, Bcommyisualization; Graphical Support for 
Software Development . Computer 18 (8), 1985,27-35. 

10. J.D. Foley and A. VanDam, Fundamentals of 


Vision Interface ’86 



11 


AN ENCODING SCHEME FOR PRESENTATION GRAPHICS WITH ANIMATION 


Howard J. Ferch 

Dept, of Computer Science, University of Manitoba 
Winnipeg, Manitoba, R3T 2N2 


ABSTRACT 

Bit-mapped raster graphics systems are 
used in the majority of personal computer 
systems. As the resolution of these 
systems increases, and the number of 
levels of grey-scale, or the number of 
colours, is increased, encoding of images 
becomes of greater concern, particularly 
for interactive systems, or those using 
animation. 

This paper presents a variation of 
run-length encoding which is faster, but 
is similar in its degree of space 
encoding, to run-length encoding. Its 
application to a menu-driven presentation 
graphics system is discussed. 

One particular advantage of this scheme is 
its adaptability to the types of animation 
possible on personal computer raster 
systems. 

KEYWORDS: raster graphics, encoding, 
animation 


INTRODUCTION 

Bit-mapped raster graphics systems are 
very commonly used, particularly on 
personal computers. These systems 
typically have a dual-ported frame buffer, 
which 18 accessible both to the display 
hardware, and to the central processor (by 
mapping the frame buffer into the 
processor's address space) [FOLEY82]. As 
memory capacity and speeds go up, and 
cos18 go down, it is possible to have 
higher resolution screen displays, and to 
display more grey-levels, or colours. 

The Increased size of the frame buffers 
means that images are larger, and hence 
occupy more disk space, take longer to 
read, and take longer to copy into the 
frame buffer. This has particular 
implications to applications in which 
speed is of major importance, such as 
those which operate interactively, or 


which contain animation. 

One system making use of such technology 
is a menu-driven graphics presentation 
system being developed for Parks Canada. 
In this system, an IBM PC-XT equivalent is 
being used to provide interpretive 
information to park visitors. Each menu 
page is displayed in a resolution of 640 
by 400 pixels, with 16 colours possible 
for each pixel. Simple animation 
sequences are used to present the 
information. For example, to demonstrate 
the creation of sinkholes, an animation 
sequence shows rain falling, the water 
soaking into the ground and dissolving the 
gypsum, the ground being undermined, and 
then the ground collapsing. Since each 
image takes 128,000 bytes of storage, an 
encoding scheme is required to reduce the 
disk space, the main memory space, the 
disk transfer time, and the display time. 


EXISTING ENCODING SCHEMES 

A large amount of attention has been paid 
to the problem of encoding graphical 
images. (See [HALL79] for an overview of 
such techniques). However, the majority 
of such schemes are not adequate for this 
particular application. Techniques such 
as Huffman encoding, which use variable 
length bit strings, are simply too slow 
for the computing power of a personal 
computer. Most image creation packages 
for personal computers (such as Apple's 
MacDraw [APPLE85]) encode a picture as the 
objects which were used to create it (e.g. 
as a sequence of line segments, polygons, 
etc.). This is the same approach that is 
used in presentation systems such as 
Telidon [CSA83]• Again, for real-time 
animation functions, this approach is too 
slow. In fact, one of the early 
objectives of the Parks Canada system was 
that it was to be much faster than 
existing museum systems based on Telidon. 

A lesser degree of space encoding is 
provided by run-length encoding, in which 
an image is stored as a set of tuples, 
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each containing a colour (or grey-scale) 
sequence, and an associated repetition 
factor. Such a technique combines a 
reasonable level of space compression, 
with a simple, and hence quick, decoding 
algorithm. This paper presents a 
variation of run-length encoding, which is 
faster, but is similar in the degree of 
space encoding, and which is particularly 
adaptable for the type of animation 
described above. 


THE ENCODING SCHEME 

Instead of storing a bit pattern (usually 
a byte or a word), and a repetition 
factor, one can make use of particular 
characteristics of the central processor 
found in the IBM PC. In this processor, 
(the INTEL 8088)[INTEL81], processor 
instructions exist to replicate a byte or 
word through memory, or to copy a string 
from one location to another. A byte or 
word pattern to be replicated into the 
frame buffer may thus be encoded as the 
machine language instruction sequence 
required to place that pattern directly 
into the frame buffer. For example, the 
following instruction sequence puts the 
byte containing the hexadecimal value 56 
into 250 successive locations of memory 
(assuming that the appropriate segment 
registers have been loaded). 

MOV AL,5 6H 

MOV CX,250 
REP STOSB 

This instruction sequence occupies 7 bytes 
of memory, and thus we can use 7 bytes to 
encode 250 bytes of the image. Since this 
instruction sequence automatically 

increments an address register to point to 
the next location in memory (in the frame 
buffer in this case), we may follow it 
immediately with another sequence which 

inserts the next pattern of the image into 
place, and so on. Thus, the entire image 
may be encoded as a machine language 
program, which, when executed, will 

generate the original image directly into 
the frame buffer. 

For portions of the image which have a 
large number of very small runs, we can 
use another sequence in the generated 
program. Having generated a portion of 
the image in the frame buffer, it is 
possible to copy any section of this into 
a later portion of the image, where the 
same sequence appears again. On the 8088, 
an instruction sequence to do this 

consists of: 

MOV SI, 80 urce offset 

MOV CX,length 
REP MOVSW 

This sequence occupies 8 bytes of memory, 
and also updates the appropriate address 


registers. One other instruction sequence 
is used. This is a variation of the 
above, which inserts a new string into the 
frame buffer, by copying it from a copy 
which 18 placed in-line in the program, as 
in: 

MOV SI,OFFSET LABELx 

MOV CX,length 

REP MOVS WORD PTR[DI],WORD PTR CS:[SI] 

JMP LABELy 
LABELx: DW the string 
LABELy: 

This sequence occupies 11 bytes plus the 
string length. 

In order to apply these sequences, the 
following algorithm can be used. The 
original image is stored in main memory, 
as it will appear in the frame buffer. 
Starting with the first word of the image, 
successive words of the image are scanned 
to see if the image starts with a repeated 
word. If so, the appropriate machine 
language code is generated, and the next 
word of the image is scanned in the same 
fashion. If the word being examined is 
not repeated, then the section of the 
image that has been generated to date is 
examined to find the largest possible 
string matching that at the current 
location. Only if no reasonable length 
such matching string exists is the third 
alternative used, and then only long 
enough to bridge the gap until a repeated 
word, or a previously encountered string 
18 again encountered. 


THE RESULTS 

Table 1 summarizes the results, as applied 
to three sample images from the 
presentation sequence. The first image is 
a landscape scene, the second contains a 
large amount of detail and a large amount 
of fairly small text, and the third is a 
map, with a fairly small amount of detail. 
This table gives the number of occurrences 
of each of the three instruction 
sequences, the maximum length of string 
generated by each, the total number of 
words generated by each type, the total 
resulting size of the machine language 
program, the space reduction factor, (as a 
percentage of the original size of 128,000 
bytes), and the size of the image when 
encoded using both an object 
representation, and using run-length 
encoding. As can be seen from the table, 
this encoding scheme does use more space 
than run-length encoding. However, the 
generation speed is approximately two 
times faster than using the run-length 
encoding, due to the lack of overhead 
spent decoding the tuples of the encoded 
image. For the three images used, the 
picture generation averaged 0.3 seconds, 
while for run-length encoding, the average 
time was 0.6 seconds. 
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Number of repetition sequences 
Max repetition size (words) 
Total number of words created 
Number of copied strings 

Maximum string size (words) 
Total number of words created 
Number of new strings 

Maximum string size (words) 
Total number of words created 

Resulting encoded size in bytes 
Percentage space occupied 
Number of bits per pixel 

Size of the run-length encoding 
Size of the object encoding 


IMAGE 1 

IMAGE 2 

IMAGE 

= = = == = = =a=* 

===3=3S=BSaSH= 

ii 

it 

ii 

ii 

ii 

n 

11 

n 

n 

n 

712 

477 

648 

1367 

530 

902 

35008 

13396 

17824 

2996 

6521 

3562 

161 

848 

159 

27333 

49432 

44606 

970 

868 

994 

14 

11 

11 

1659 

1172 

1570 

37343 

61000 

41346 

29% 

48% 

32% 

1.17 

1.91 

1.29 

22712 

56168 

26256 

17470 

N/A 

7870 


TABLE 1 - ENCODING RESULTS 


REFINEMENTS 

One problem exists with the given 
algorithm. On many of the display 
adapters using a frame buffer which is 
dual-ported between the display and the 
main memory, it is not possible to read 
from the frame buffer at any desired time. 
Instead, due to the Interactions in the 
hardware, it Is necessary to read the 
frame buffer only during the retrace 
intervals, to be sure that the data is 
correct. Thus the second sequence given 
above must be modified somewhat. An 
additional 2 bytes was added to call a 
subroutine which waits for the retrace to 
begin. In addition, the maximum length of 
the move must be limited. Rather than 
doing this, a refinement to the algorithm 
was introduced. 

The simplest approach was to remove the 
use of the second sequence (copying 
already existing sequences). It was felt 
that this would probably lengthen the 
encoding somewhat, but the display time 
would be reduced. In fact, this did not 
happen. For images which did not contain 
a large amount of text, the encoding was 
actually smaller, provided that the 
encoding was done on a byte, rather than a 
word level. With this result, the third 
sequence (generating a new string) was 
also removed, with the same result. Thus 
the resulting encoding which is actually 
used is very simple. Runs of repeating 
bytes were encoded using only the first 
sequence, with an additional optimization 
using a special sequence for runs of 
length 1 byte, or 2 bytes. For the three 
images given above, the resulting 


encodings had sizes of 37878, 62717, and 
38574 respectively. The overall total 
size across all Images was slightly 
reduced, with no loss of speed. 

One other aspect of interest for this 
encoding approach is its adaptability to 
other processors. The two major 
competitors to the 8088 processor are the 
Motorola 68000 [M0T80], and the National 
Semiconductor 16000 [NAT83]. While the 
algorithm can be adapted to both of these 
processors, it cannot be done in as simple 
a fashion as it is for the 8088. Neither 
of these processors provides an 
Instruction which can replicate a value 
through a range of memory locations. Thus 
a loop is required. In order to achieve 
similar compaction levels, this requires 
the use of an out of line subroutine, with 
a subsequent loss of speed. 


ANIMATION 

This encoding scheme is easily adapted to 
support the type of animation described 
earlier. To provide animation, a sequence 
of related images is generated. Then, the 
above encoding scheme is used to Insert 
the changes from each image to the next in 
turn into one machine language routine. 
In many cases, a slower transition from 
one image to another is desired, such as 
when scrolling text onto an image, or in 
having a river change its colour in a 
given direction, to portray flooding. In 
these cases, delays are added to the 
machine language code generated, to 
achieve the desired rate of speed. 
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Since dissolves from one scene to the next 
occur very often, a suite of routines to 
provide different orderings and timings of 
dissolves has been created. In the most 
general case, the user may add a line of 
any shape and size to an image and request 
an ordered dissolve from one image to the 
next moving outwards from the given line, 
or moving inwards, and he may at the same 
time specify the speed of movement. 
Standard top to bottom, left to right, etc 
orderings are also provided. 

Another adaptation allows for repetitive 
events, such as the blinking of an arrow. 
In this case, code to insert the arrow, 
and then to remove it, is generated and 
then the machine language routine is 
simply repeatedly executed. Delay 
sequences are added to achieve the desired 
rate of blink. 


CONCLUSIONS 

An encoding scheme for bit-mapped frame 
buffers which are memory mapped into the 
central processor’s address space has been 
described. This scheme provides for very 
fast decoding, while at the same time 
providing a reasonable level of space 
encoding• 

A complete menu-driven interactive display 
system for Parks Canada has been 
constructed using this encoding scheme 
with the following results. A total of 
121 images have been encoded. Of these, 
65 primarily contain text (although text 
and graphics can be freely intermixed), 
and the remaining 56 primarily display 
landscape, or map information. In most 
cases, an image is generated from the 
previous one using a dissolve sequence. 
The 65 text images are encoded in 
1,166,141 bytes, for an average of 17,940 
bytes per image. The 56 graphics displays 
occupy 1,865,833 bytes, for an average of 
33,318 bytes per image. The resulting 
encodings, including the animation timing 
delays, and all overhead occupy 19.6% of 
the original space of the images. The 
display speed is just slightly reduced 
from the maximum achievable display speed. 
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KINEMATIC AND GEOMETRIC MODELLING AND ANIMATION OF ROBOTS 
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ABSTRACT 

At present, most industrial robots are programmed in a 
teach mode. In the meantime, robots are called upon to perform 
increasingly complex tasks, which makes programming by 
teaching rather tedious and cumbersome. There is an increasing 
need for effective tools to assist in off-line programming of robots 
and verifying their programmed moves. 

A program for modelling and simulating the PUMA 560 
and ADEPT I robots has been developed. The kinematic models 
required for the transformation from task space to robot con¬ 
figuration space, and vice-versa, are briefly outlined. The robot 
geometric models and their graphical manipulation has been 
implemented using MOVIE BYU and PLOT-10 on a Tektronix 
4115-B graphics terminal and a VAX 11/730 minicomputer. The 
developed kinematic and geometric models are used for off-line 
simulation, animation and verification of robot movements 
during assembly operations. Sample outputs which illustrate the 
program capabilities such as shaded images, wire-frame anima¬ 
tion, arbitrary point tracing, actual movement envelope and the 
various menus are included. 

KEYWORDS: geometric modelling, robot kinematics, graphical 
simulation and animation, robot off-line programming. 


INTRODUCTION 

Robots have recently been gaining popularity and 
acceptance as an efficient tool for accomplishing various manu¬ 
facturing tasks. Current applications include material handling, 
welding, painting, deburring, etc. Robots’ greatest potential for 
increasing productivity is in the field of automated and flexible 
assembly. The key to their success in this area will be in 
integrating robots with adequate sensors and providing high 
level sophisticated programming tools. 

One of the major goals of our current research in the 
Centre for Flexible Manufacturing Research and Development at 
McMaster University is to develop systems which can auto¬ 
matically synthesize programs for controlling robots performing 
assembly tasks with sensor feedback. The input to these systems 
includes: 

a) specification of assembly tasks and goals, and 

b) specification of the initial state of both the robot and 
world models. 


An expert system written in COMMON LISP, currently under 
development, then uses the knowledge base and production rules 
to produce a plan for robot motions, and a robot level program in 
VALII. The on-line expert system will allow updating of 
preplanned robot moves in response to sensor input. 

Achieving this goal requires several building blocks, 
including: 

1) Robot geometric and kinematic models, 

2) Robot world geometric model, including parts .and 
tasks, 

3) Sensor models, 

4) Motion planner, 

5) Adaptive learning model, and 

6) Knowledge base and inference engine. 

Several research projects are in progress to develop these 
modules. This paper focusses on the generation of the kinematic 
and geometric models of the two robots used in the centre, and the 
links between these models and the rest of the modules 
mentioned above. 

The flexible robot assembly system, installed in the 
Centre for FMS research and development, consists of two robot 
work stations, an IRI vision inspection station, a load/unload 
station, and a computer-controlled Bosch conveyor, with pallets, 
for material handling. The two robots are a six-axis articulated 
PUMA 560 and a four-axis ADEPT I scara robot. Both robots are 
interfaced with force, tactile and vision sensors for real-time 
adaptive control. The flexible assembly system is used for both 
research and development projects related to mechanical and 
electronic assembly. 


THE KINEMATIC MODELS 

Controlling and programming robots requires analytical 
models which relate the robot configuration space, expressed in 
terms of joint variables, and task space normally described in 
Cartesian coordinates. These models should allow efficient 
transformation between the two spaces since robots are usually 
controlled in the joint space while tasks are normally defined in 
the Cartesian space. 

A robot kinematic model consists of two parts: 

1) Forward Kinematics: which accomplishes the trans¬ 
formation from configuration space to task space, 
and 

2) Inverse Kinematics: which transforms representa¬ 
tions in task space to configuration space. 

Kinematic models may be used to transform positions, velocities 
and accelerations between the two spaces. 
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Several methods have been developed in recent years to 
obtain generalized and efficient solutions to these kinematic 
models [1-81. The emphasis has been twofold: a) concise and 
elegant model representation, and b) fast and efficient solutions 
to these models. The method and notation used in developing our 
models is based on those developed by Denavit and Hartenburg 
[11 as applied to robots by Paul [41. The results are summarized 
below. 

PUMA 560 ARTICULATED ROBOT 

PUMA 560 is a 6-degrees-of-freedom articulated robot 
where the first three joints are used to place the manipulator 
wrist in its work envelope and the last three joints are used to 
orients the wrist. All robot joints are rotational. 

The coordinate frames used to derive the kinematic 
models of the PUMA 560 robot are shown in figure 1. The 
kinematic parameters are summarized in Table 1. 



Table 1: PUMA 560 Configuration Constants 


Joint # Joint Variable Joint Angles Range (degrees) 


0i min. 


0i max. 


1 

0i 

-160° 

+ 160° 

2 

02 

— n — 43° 

+ 43° 

3 

03 

-52° 

4- n + 52° 

4 

04 

-110° 

+ 170° 

5 

05 

-100° 

+ 100° 

6 

®6 

-266° 

+ 266° 


Fig. 1 PUMA 560 coordinate frames and kinematic parameters. 
Inverse Kinematics 

Using Paul’s method, a solution for the inverse 
kinematics model of ADEPT I was derive- 


P -P 

y x 


<±>v -1 


0 L = tan - 


Joint # 

ai(mm) 

dj (mm) 

cl[ (degrees) 


1 

0 

0 

-90 

where 

2 

431.81 

149.09 

0 


3 

-20.31 

0 

90 

A = 

4 

0 

433.07 

-90 


5 

0 

0 

90 


6 

0 

56.25 

0 

__ 


(±)V - 2 -i 

A 2 


+ P 


P 2 + P 2 + a 2 - a 2 

x y 1 2 

2 a t r 


= Vp 2 + p 2 

X y 


(1) 


( 2 ) 

(3) 


The reader is referred to [9] for details of the development 
of both forward and inverse kinematic model solutions of PUMA 
type robots. This method was closely followed in developing our 
PUMA 560 kinematic models. 

ADEPT ISCARA ROBOT 


and the ( + )ve and ( —)ve signs in equation (1) refer to the right 
and left hand robot configurations respectively. 


0 2 =tan 1 


P cos0, — P sin©, 
y lxl 

P sin0, + P cos0, — a, 

y l x 1 1 


(4) 


ADEPT I robot is a 4-degrees-of-freedom manipulator 
with two rotational joints (#1 and #2) and one rotational and 
prismatic joint (#3). Figure 2 shows the kinematic model of the 
ADEPT I and its link and joint parameters. Table 2 lists the arm 
kinematic parameters. 


0 3 = 2tan 1 


n y cos(0 1 4- 0 2 ) — n x sin(0 1 4- 0^ 
n y sin(0 1 + 0 2 )+n x cos(0 { 4- 0^ 


where 


(5) 


Table 2: ADEPT I Configuration Constants 


Joint # 
(i) 

Joint 

Variable 

Variable Limits 
min. max. 

di ai(mm) d[(mm) 

1 

0i 

-150° 

+ 150° 

0 425 870 

2 

02 

-147° 

4-147° 

0 375 223.5 

3 

03 

-27V 

+ 277° 



d3 

0 

203 mm 

0 0 0 


cos 03(cos 0i cos 02 - sin 0i sin 02) 

4- sin 03( - cos 0i sin 02 - sin 0i cos ©2) 

(6) 

cos 03(sin 0i cos 02 4- cos 0i sin 02) 

-1- sin 03( - sin 0i sin 02 4- cos 0i cos 02) 

(7) 

d3 = di + d2 — P z 

(8) 


where the actual linear displacement of joint 3 is equal to d 3 -d 3 r 
and d 3 ' is a physical offset (d 3 f = 215 mm). 
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Fig. 2 ADEPT I coordinate frames and kinematic parameters. 
Forward Kinematics 

The forward kinematics solution defines the Cartesian 
coordinates of the robot end effector in terms of the robot joint 
variables as follows: 

T 3 = orientation matrix 

— n x 0 X a x 

n y Oy a y (9) 

n z Oz a z 

where 

n x = cos( 0 i + 02 + 03 ) 

n y = sin( 0 i + 02 + © 3 ) ( 10 ) 

n z = 0 

0 X = sin(0i + 02 + 03 ) 


Oy = —COS(01 + 02 + 03 ) (11) 

0 Z = 0 

a x = 0 , a y = 0 , a z =— 1 . ( 12 ) 

The Cartesian coordinates are given by: 

P x = ai cos 0i 4- a2cos(0i -I- @ 2 ) (13) 

P y = ai sin 0i + a 2 sin(0i + @ 2 ) (14) 

P z = di + d 2 - d 3 (15) 


The closed form solutions of the forward and inverse 
kinematic models of the PUMA 560 and ADEPT I were imple¬ 
mented in FORTRAN 77 and linked with a robot geometric 
modeller and simulator called ROBOT. The Cartesian and Joint 
Coordinates, calculated using the developed models, were 
compared with those measured using the two robots in all eight 
quadrants. The maximum absolute errors were in the range of 
0.0028% and 0.06%. 


GEOMETRIC MODELLING 

Robot motions are not easy to visualize or verify using the 
mathematical models only. Interactive graphical simulation of 
robot motion is a powerful tool for verifying roobt programs off¬ 
line and evaluating the robot interaction with surrounding 
objects. It provides for rapid error checking and assists in laying 
out the robot work cell. 

The geometric modelling system MOVIE BYU was used 
to generate robots and world models. Wire frames were 
generated from the geometry data base and used for robot anima¬ 
tion to increase display speed. This procedure and the user’s 
menus were developed using GKS PLOT 10 and implemented on 
a VAX 11/730 minicomputer and a Tektronix 4115B colour dis¬ 
play. This approach significantly reduces the overhead normally 
required by MOVIE BYU, and maintains the rich details of the 
graphical model while producing relatively fast images for 
dynamic display. 

PROGRAM OVERVIEW 

ROBOT is an interactive program which performs 
graphical simulation of the kinematics of arbitrary open chain 
linkages such as robots. ROBOT is primarily menu-driven. All 
functions are selected from menus displayed on the right hand 
side of the screen. Keyboard input is only required when param¬ 
eter values or file names are needed within various functions. 
Menu selection is done via keyboard, thumbwheels and/or 
graphics tablet. ROBOT utilizes geometric files created using 
MOVIE BYU and kinematic configuration files corresponding to 
each robot. Currently both PUMA 560 and ADEPT I have been 
modelled geometrically and kinematically. The program 
performs the following functions: 

1) Reads geometric and kinematic data files corre¬ 
sponding to the selected robot. 

2 ) Allows the user to create or update the kinematic 
parameters of the robot. 

3) Allows the user to specify the geometry and 
dimensions of end effectors and grippers. 

4) Accepts the Cartesian coordinates of the robot end 
effector, solves the inverse kinematic problem, pro¬ 
duces the corresponding joint coordinates and 
displays the resulting robot configuration. 

5) Accepts joint coordinates, solves the forward kine¬ 
matics problem, produces the Cartesian coordinates 
and displays the corresponding robot configuration. 
Joint parameters may be entered as absolute or 
relative values. 

6 ) Moves the robot to a new configuration either 
instantly or in a sequence of successive configura¬ 
tions for animation. The number of steps provided 
by the user is used to create equal step sizes for each 
joint. Frames may either be left on the screen, 
resulting in a superimposed sequence of images, or 
cleared before the next one appears. 

7) Allows any point(s) in the geometry to be dis¬ 
tinguished and used to trace a path(s) in space 
during robot animation. This feature is useful in 
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verifying the path traced by a tool or end effector and 
in visually checking the collision of the selected part 
of the robot arm with other objects in the workspace. 
Vertical walls from the selected point trace may also 
be viewed to further clarify the 3-D path and work 
envelope. 

8) Allows the user to alter the view of the world. The 3- 
D graphical representation produces perspective or 
parallel projection according to the user’s selection. 
The user’s point of view is altered by rotating the 
robot coordinate system (about x, y or z axes) and 
centering the image on the display screen. 

9) Displays in the upper right hand corner of the screen 
all selected options in the form of ON and OFF 
switches which can easily be toggled to change the 
corresponding status. 

10) Continuously displays and updates the value of 
robot joint variables and highlights those violated 
by a given robot move. 

11) Produces stereo pair for hard copy by generating 2 
images of the current view. The First image is a per¬ 
spective projection from the left eye point of view 
and the second is from the right eye point of view, 
which is reflected about the vertical axis in the 
projection plane. When both images are made into 
hard copies, they result in a stereo image when 
viewed simultaneously with the aid of a small plain 
mirror. 

Sample outputs are included to illustrate the various 
options and capabilities of the ROBOT program. 

ROBOT GEOMETRIC MODEL BUILDING 

The ROBOT program is a general purpose simulation and 
animation program capable of simulating any open chain type 
mechanism. The type of robot, its geometry, kinematics and 
physical constraints are defined in separate files. Therefore, a 
library of different robot models may be created for future use. 
Currently, only the PUMA 560 and ADEPT I robots have been 
completely modelled in our system. 

Each robot is represented as an assembly of robot links. 
Each link is constructed as a separate object using MOVIE BYU. 
The reference coordinate frame for each link is positioned at the 
link joint according to the Denavit and Hartenburg convention. 
Therefore, the robot shape and dimensions are modelled as a 
collection of polygonal surfaces or lines in 3-D space. The 
geometry consists of Cartesian coordinates in RxRxR (called 
points or nodes) along with a connectivity relation on the nodes 
which is used to determine which lines are to be drawn in a wire 
frame model. Any redundant lines produced by the connectivity 
relation will be deleted and removed by the optimizing module in 
ROBOT as soon as the robot geometry File has been read. Robot 
geometry generated in this fashion is stored in Files and accessed 
later by ROBOT. Homogeneous transformation matrices des¬ 
cribing the relative translation and rotation between adjacent 
link coordinate frames are used to transform the modelled links 
into a complete robot conFiguration [101. 


CONCLUSIONS 

The developed kinematic and geometric models of the two 
robots used in our research program, namely PUMA 560 and 
ADEPT I, were described. The interactive graphics simulation 
program, ROBOT, proved very useful in displaying and verifying 
robot moves. This program will be integrated with a robot task 
planner which is currently being developed at McMaster 
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University. It will certainly be very useful in checking out the 
robot programs produced by the robot tasks planner based on 
geometric and functional reasoning to achieve the stated goals. 
This planner is being developed in common LISP and the main 
application domain is mechanical assembly. Other modules are 
being added to ROBOT to model objects in the workspace. 
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Fig. 3 Examples of graphical outputs produced by ROBOT. 
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ABSTRACT 

To design an intelligent graphics system, con¬ 
ceptual information has to be represented and 
reasoned about. This paper explores knowledge 
representation schema, tools and techniques which 
are necessary for creating such a system. We will 
present a system that allows us to relate the mean¬ 
ing of a picture to its graphic representation. Two 
major blocks of the system are responsible for 
abstract reasoning about a picture and for picture 
composition respectively. We will show how a 
Semantic Network formalism and Semantic 
Network-based reasoning can be employed to allow 
abstract reasoning about a picture. How based on 
conceptual level information, the system reasons 
about appropriate scene composition and image gen¬ 
eration. A smooth transition from conceptual level 
information to the actual picture composition is also 
achieved. We introduce a simple environment in 
which we have tested our approach. With in the 
constraints of this environment our system reasons 
about abstract non-visual concepts, decides which 
physical objects should be displayed, and renders 
them into an image. 

KEYWORDS: 

Intelligent' graphics system; Knowledge representa¬ 
tion; Temporal reasoning; Scene composition. 


1. Introduction 

Current graphic systems place the tasks of scene 
composition and object formation on the user. Their 
performance is limited to the representation of an 
image that was fully determined by a programmer. 
The situation would improve when, instead of bur¬ 
densome testing of different versions of an image, the 
user could obtain some assistance in deciding about 
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the user could obtain some assistance in deciding 
about graphic presentation from an intelligent com¬ 
puter graphics system. 

In order for a system to provide such assistance, 
it should have some knowledge about the nature of 
the objects which user wants to display. In many 
cases, the user may wish to give only abstract 
specifications of the picture he wants to see. How 
does a system decide what to display and in what 
form it should be displayed? What kind of 
knowledge representation and reasoning mechanisms 
do we need to cope with this problem? How does 
one go from abstract specification of the picture to 
the actual image being generated? In this paper we 
will try to answer some of this questions and 
present a design of a system which is responsible for 
the appropriate picture composition and generation. 

To create an intelligent graphics system, tools 
must first be developed which integrate techniques 
from the fields of Artificial Intelligence and Com¬ 
puter Graphics in a coherent manner [2]. Our system 
consists of two modules. Reasoning and determina¬ 
tion of the conceptual specifications for the picture is 
facilitated by the Semantic Network Processing Sys¬ 
tem (SNePS) [4]. SNePS allows us to create the con¬ 
ceptual knowledge base in the form of a semantic 
network and is capable of including rules of infer¬ 
ence in the same representation. It decides on the 
appropriate meaning of the picture and on the 
objects which should therefore be included in the 
picture. SNePS passes this information to a graphics 
module (Graflisp [ 1 ]) along with some set of restric¬ 
tions for the picture composition. 

A graphics module is responsible for object gen¬ 
eration, transformation and display rendering. 
Graflisp maintains knowledge of object structures 
and of inter-relationships between objects. Objects 
bare a special class inheritance. This allows objects 
to be comprised of any forms which can be func¬ 
tionally defined, rather than limiting them to be 
built-up from a polygonal base. Users may define 
and link their own classes to the classes which are 
currently defined (polygons, spheres, surfaces of 
revolution, etc.). In addition to the smooth shading. 
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realism and perceptual clarity is enhanced by the 
use of color or texture patterns on the object’s 
surface. The system can reference both the color 
and texture mapping functions associated to an 
object at render-time. This allows either the map¬ 
ping of images (from a camera or a previous image) 
or of a functionally defined artificial texture onto an 
object’s surface (see Fig. 1). 


2. Conceptual level 

Since we want to deal with various concepts, 
we have to represent them in our system. There¬ 
fore, we use a semantic network in which every con¬ 
cept is represented by a node. Such a semantic net¬ 
work formalism has been implemented by S. C. 
Shapiro [4]. The arcs between concepts represent 
relations of one concept to another concept. We can 
talk about the distance from one concept to the 
other, where, the shortest path via some set of arcs 
as an intuitive “closeness” of two concepts. In gen¬ 
eral, the meaning of every concept (i.e. node) in the 
network is defined in terms of the rest of the net¬ 
work . 

We will differentiate between abstract concepts 
and concepts of some physical objects. Whenever a 
user refers to some abstract concept, a correspon¬ 
dence should be found between this abstract concept 
and some physical object. To achieve this correspon¬ 
dence the domain-specific knowledge and “distance” 
factors are used. If the abstract concept corresponds 
to several physical objects, the system queries the 
user and stores the answer as a default choice for 
future reference. Later, if the system is faced with 
the same choice, the previous decision is made. The 
user, of course, may override the default. 

Sometimes, in addition to the objects directly 
specified by the user, we want to display some 
objects which are semantically appropriate in the 
image. For example, if somebody wants to see a 
drawing of the president’s home, he may expect to 
see related objects in the scene, such as the president 
in front of the White House. Thus, at a present 
time, the system should display Reagan in front of 
the White House. However, if we ask the system to 
draw the same picture five years later, it will not be 
Reagan. Two different kinds of inference are present. 
A semantic inference tells us that when we want 
to see a picture of the White House, it may be 
appropriate to display the president in front if it. A 
temporal inference tells us that at the present time 
the president of U.S. is Reagan. We will examine, 
both semantic and temporal reasoning in our sys¬ 
tem. 


3. Puppets World 

To study what type of representation and infer¬ 
ence is necessary for integrating conceptual informa¬ 
tion with the actual picture generation we needed to 
have a domain which does not require a large body 
of common-sense knowledge. It should be rich 
enough to have several abstract concepts, several 
physical objects and in which temporal reasoning 
can be tested. We have chosen Puppets World f. In 
Puppets World several puppets live in several 
houses. The system knows a physical description of 
every puppet as well as a correspondence of each 
concept of a puppet to its 3-D graphical “body”. 
First, we create concepts (i.e. nodes) of several pup¬ 
pets. We tell to the system that every puppet has a 
place to live and a place to work. Then, we intro¬ 
duce concepts of a home and a workplace. We 
assume that during every day of the week each pup¬ 
pet goes to his corresponding work and is at home 
during any other time. Thus, if the system needs to 
know who is present at a certain time and a certain 
location, the temporal reasoning mechanisms will 
have to be employed. 

We give the following information to the sys¬ 
tem: Ron, Bill and Jane are puppets. There are two 
locations: the White House and a Regular House. 
Bill and Jane live at the Regular House. Jane works 
at home and Bill works at the White House. Ron 
lives and works at the White House. 

We want our system to be able to generate pic¬ 
tures from the following types of queries: “Show 
me the house of Bill”; “Show me the home of Ron”; 
“Show me a picture of a puppet and a house”; 
“Show me a workplace of Jane”, etc. We would like 
the system to decide what various abstract concepts 
(like home and a workplace) correspond to. We also 
want our system to display appropriate puppets 
with every location if they are at this location at the 
given time. For example, we want the system to 
display a regular house and Jane if we ask “Show 
me the home of Bill” during Bill’s work-time. 


4. Semantic Inference 

After the “Puppets World” data is given to the 
system, we also have to tell the system about the 
relations among different concepts. The system 
knows in which house which puppet lives. We can 
now proceed to specify what the concepts of “work¬ 
place” and “home” mean: 


t This name was given as an analog the the well-known 

Blocks World. 
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Figure 1. Graflisp example of color mapping, light 
modelling and hidden surface removal. 



Figure 3. Situation at the White House during off- 
hours. 



Figure 5. Situation at the home of Bill during busi¬ 
ness hours. 



Figure 2. System’s response to a request to display a 
Regular House. 



Figure 4. System’s response to a user’s request to 
display the White House. 



Figure 6 . Situation at the workplace ol Jane during 
free time. 
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; For all x 
; if x is a puppet 
; then for all y 
; if y is a house where x lives 
; then y is a home of x. 

(build avb $x 

ant (build member *x 
class puppet) 
cq (build avb $y 

ant (build verb live 
actor *x 
place house 
placeName *y) 
cq (build verb live 
actor *x 
place home 
placeName *y))) 



In the above rule “avb” arc stands for “all- 
variables-bound” and represents a universal 
quantifier. SNePS uses its rules in both forward 
and backward reasoning. In this rule, an English 
description is given first, (lines starting with semi¬ 
colons) an actual SNePS User Language code follows 
and then a semantic network which is actually 
built. In this semantic network rule, node ml7 
represents the entire rule. If the antecedent of the 
rule has matches in the network (node ml3 ), then 
a consequent of the rule (node ml6 ) is executed. 
But the network under node ml6 is also a rule. 


Thus, the system looks for matches of proposition 
represented by the rule node ml4 and asserts the 
proposition under ml5 with appropriate bindings 
for x and y. A similar rule is created to represent 
the fact that a place where puppet works is his 
workplace: 


; For all x 
; if x is a puppet 
; then for all y 
; if y is a house where x works 
; then y is a workplace of x. 

Since the system has the initial knowledge of 
places where people live and places where people 
work , it will be able to deduce what a reference to 
somebody’s home or workplace mean. For example, 
imagine that we ask the system to display a work¬ 
place of Jane. The system will first check if Jane is 
a puppet (which- is given), then it will look for a 
place where Jane works and bind value of y to the 
workplace of Jane. The graphics package will then 
proceed to display a house, which is where Jane 
works, realizing that it is the right house to display 
(see Fig. 2 ). 


5. Temporal Reasoning 

We can express the rules that all puppets are at 
work during work-hours and at home during off 
hours: “If it is a freeTime , then a puppet must be at 
home”; “If it is a workTime, then a puppet must be 
at work”. 


; For all x 
; if x is a puppet 

; then if currentState is workTime 
; then x is at his workplace. 

(build avb $x 

ant (build member *x 
class puppet) 
cq (build ant (build 

currentState workTime) 
cq (build 

verb currentlyPresent 
actor *x 

place workplace))) 

The system may not know what a home (or a 
workplace) of x is. It just assumes that an actor is 
at some workplace or home. Then, it will have to 
deduce what the particular home or workplace of x 
is. (Two SNePS rules have been created to make this 
inferences.) To decide whether it is a workTime or a 
freeTime, the next set of rules is built: 
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(build avb ($day Shour $minute) 

ant (build name: isItWorkTime 
dayOfTheWeek *day 
hour *hour 
minute *minute) 

cq (build currentstate workTime)) 


(build avb ($day $hour Sminute) 

ant (build name: isItFreeTime 
dayOfTheWeek *day 
hour *hour 
minute *minute) 

cq (build currents tate freeTime)) 

These two rules assert that it is a workTime or 
it is a freeTime if and only if a function node “islt- 
Worktime” or “isItFreeTime” succeeds. The system 
uses non-monotonic reasoning, so the state may 
change from workTime to freeTime. The system 
must not use old assertions about the time; instead, 
it must check the time again if it needs to. The sys¬ 
tem solves this problem by removing the temporal 
assertions each time, thus forcing the system to 
deduce them again. 

We have tested our system by asking it 
different queries during different times of the day. 
During “freeTime” we ask the system to display the 
situation at the home of Ron ( Fig. 3 ). During day 
time we ask the system to show the situation at a 
home of Bill (see Fig. 5 ). In the evening, we ask the 
system to display the workplace of Jane ( Fig. 6 ). 

When SNePS needs to display a set of objects, it 
creates a process which queries Graflisp about its 
capability to display those given objects. If Graflisp 
is capable of displaying the given set of objects, it 
collects, orders, orients, composes and renders that 
given set into an image. Then Graflisp passes a suc¬ 
cess message to the SNePS process, which conse¬ 
quently enables SNePS to deduce that the conceptual 
request is display able. If the description which 
SNePS was provided with on a conceptual level can 
not be visualized in an image, Graflisp will be 
unable to find the corresponding objects or rules 
necessary to compose the scene and will pass failure 
back to the SNePS process. 

Based on the information being passed from 
SNePS, the Graflisp module of the system is respon¬ 
sible for composing and rendering the image within 
the constraints of its view camera. The view camera 
determines the area of space to be viewed, the degree 
of perspective deformation, and the orientation of 
the image it will produce. 


6. Scene Composition 

After the system has inferred which objects are 
actually going to appear in the picture, it is the 
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responsibility of the graphics module to arrange 
these objects into a coherent scene and produce the 
image. To do this, first it must gather all of the con¬ 
ceptual constraints to be placed on the image and 
extrapolate them into three dimensional environ¬ 
mental constraints. The system starts this process 
by calculating a bounding hull for each object 
involved; these hulls are used to speed up rough cal¬ 
culations and insure that no two solid objects will 
occupy the same space. 

The graphics module (Graflisp) starts to build 
an object hierarchy for the scene. The algorithm that 
it uses is very similar to the way in which a photo¬ 
grapher might shoot a picture of a table-top scene. 
After using the hulls to calculate how much space it 
will need, Graflisp creates a sufficiently large blue 
backdrop and a horizontal green surface to serve as 
the sky and grassy ground for the environment in 
which it will place the objects. It then places a 
simulated camera into that environment at what 
would be eye-level for a puppet, and adjusts the 
camera’s settings accordingly (lens angle, focal 
point, f-stop, etc). Additionally, a light source is set 
up above and behind the camera to serve as the Pup¬ 
pet World sun. 

Next, the graphics module utilizes production 
rules to implement the particular preferences of the 
user. For example, if the user prefers to see puppets 
to the left and buildings to the right in images, then 
the system will query its objects as to their object 
class and order them from left to right accordingly, 
relative to the camera. Given this left-right ordering, 
and allowing an equal amount of image space for 
each object, three dimensional constraint pyramids 
are extrapolated out from the camera’s film plane 
into the scene to serve as constraints on each object’s 
placement in the scene. Objects are then placed into 
the scene as near to the camera as is possible 
without the object’s bounding hull violating the 
object’s constraint pyramid. Once all of these con¬ 
straints are used to determine the objects’ place¬ 
ments, Graflisp completes structuring the scene 
hierarchy and uses a z-buffer algorithm to render 
the scene into an image as it would have been seen 
by the simulated camera. 


7. Reasoning about the display 

How does the inference engine know 7 whether 
the abstract concepts have a physical counterpart, 
and if they do, whether the graphics module is capa¬ 
ble of displaying them? In order to perform further 
reasoning about the picture, the system has to have 
a way to assume that some picture elements have 
been displayed. A solution is to attach special Lisp 
functions to our rules, which will query the graph¬ 
ics module about its success or failure. SNePS func¬ 
tion nodes give a procedural attachment capability to 
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the otherwise declarative style of SNePS program¬ 
ming. To “prove” a function node, the system must 
call the function which is associated with it. This is 
the actual SNePS-Graflisp interface. Here is one 
example of such a display rule: 


; For all x, y, and z 

; if x is working in a place z by the name y 
; then if process of drawing y succeeds 
; it must be the case that we displayed it. 

(build 

avb ($x Sy $z) 
ant (build verb work 
actor *x 
place *z 
placeName *y) 

cq (build 

ant (build name: showPicture 
placeName *y) 
cq (build action display 
description 

(build verb work 
actor *x 
place *z 
placeName *y) 
type *z))) 

This rule allows the system to reason about a 
particular action: displaying the workplace of one of 
the puppets known to the system. The arc named 
“name:” is a special system predefined arc which 
points to a function node. A function node creates a 
process which calls a Lisp function with an argu¬ 
ment bound to y. The process can either succeed or 
fail. If the above process succeeds, then the conse¬ 
quent rule is asserted. 

If we had used this rule in a backward chain¬ 
ing, the system would have tried to prove that for 
some actor x, working somewhere at place z, with 
the “placeName” y the picture had been displayed. 
If none of the variables were bound, this would be 
similar to the query: “Show me all the places of 
work for all the puppets”. If, on the other hand, 
some of the variables were bound, it would be a 
reference to some specific instance of x , y or z. For 
example, if x is bound to Ron and y and z are free, 
this would be equivalent to the query: “Show me 
the place in which Ron works”. To prove this, we 
would have to prove that the entire nested rule is 
valid. This rule would create a SNePS process, which 
would call Graflisp. When Graflisp displays the pic¬ 
ture, the function node succeeds and returns true, 
and then the final consequent would be asserted. 
An English description of the rule for displaying a 
puppet's living place is provided below. 


; For all x, y, and z 

; if x is living in a place z by the name y 
; then if process of drawing y succeeds 
; it must be the case that we displayed it. 


For example, if we ask the system to show the 
house where Ron lives, the White House will be 
displayed (see Fig. 4 ). 


8. Conclusions 

The developed system allows us to relate the 
meaning of a picture to its graphic representation. 
Its two major blocks facilitate the intelligent com¬ 
puter graphics system's needs for reasoning about 
the content of a picture, the picture composition and 
the image generation. By interlinking SNePS and 
Graflisp, we have been able to obtain images ori¬ 
ginating from abstract requests. We have intro¬ 
duced a simple environment consisting of several 
puppets living and working in several places (called 
Puppet’s World). Using abstract concepts (such as a 
home or a workplace) our system has successfully 
deduced which objects should be displayed and has 
displayed them. 

In the next phase of this research, we plan to 
incorporate default display rules of user preferences. 
To collect and generate such rules, we will use the 
system described in [3]. We plan to develop an 
interactive visual test generator and rule acquisition 
package which can be used to customize default 
display rules to the preferences of a particular user 
as well as to the preferences of different classes of 
users. 
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ABSTRACT 

The Prolog language is a useful tool for geometric and 
graphics implementations because its primitives, such 
as unification, match the requirements of many 
geometric algorithms. We have implemented several 
problems in Prolog including a subset of the Graphics 
Kernel Standard, convex hull finding, planar graph 
traversal, recognizing groupings of objects, and boolean 
combinations of polygons using multiple precision 
rational numbers. Certain paradigms, or standard 
forms, of geometric programming in Prolog are becom¬ 
ing evident. They include applying a function to every 
element of a set, executing a procedure so long as a 
certain geometric pattern exists, and using unification 
to propagate a transitive function. Certain strengths 
and weaknesses of Prolog for these applications are 
now apparent. 

RESUME 

Le langage Prolog est un outil tres utile pour la concep¬ 
tion de logiciels geometriques et graphiques. Ceci est 
du au fait que ses primitives, comme par exemple 
l’unification, repondent bien aux exigences de nom- 
breux algorithmes geometriques. Nous avons resolu en 
Prolog plusieurs probldmes dont la representation d’un 
sous-ensemble de la norme graphique Kernel, la 
determination d’enveloppes convexes, le traitement de 
graphes plans, la reconnaissance de families d’objects 
et la realisation de combinaisons booleennes de poly- 
gones utilisant des nombres rationnels a precision 
elevee. Certaines hypotheses ou formes standard de 
programmation deviennent evidentes en Prolog. Ceci 
est vrai entre autre pour l’application d’une fonction a 
tous les elements d’un ensemble, l’exdcution d’une 
procedure tant qu’un certain motif geometrique existe 
et l’utilisation de l’unification pour la propagation 
d’une fonction transitive. Certaines forces et faiblesses 
de Prolog vis a vis de ces applications sont maintenant 
apparentes. 
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INTRODUCTION 

The fifth generation logic programming language 
Prolog[Clocksin81 a, Coelho80a], appears appropriate 
for research in geometry and graphics. Some examples 
of its use in architectural design are given in 
[Swinson82a, Swinson83a, Swinson83b]. Its use in 
CAD has been evaluated in [Gonzalez84a]. Construct¬ 
ing geometric objects from certain constraints is 
described in [Bruderlin85a]. Over the past two years, 
the authors of this present paper have implemented 
several geometric and graphic problems in Prolog using 
assorted machines. This paper describes the experi¬ 
ences, including some paradigms of programming that 
have appeared useful, and finally listing the advantages 
and disadvantages of Prolog that we have experienced. 

Over the last two years we have implemented 
several graphics and geometric algorithms in Prolog, 
totally a few thousand lines of code, using four 
different Prolog interpreters on four different comput¬ 
ers. The systems include: 


Machine 

Operating System 

Prolog Version 

IBM 3081 

Michigan Terminal System York (U.K.) 

IBM 4341 

CMS 

Waterlog 

Prime 750 

Primos 

Salford 

VAX 780 

Unix bsd 4.3 

UNSW 


This work was supported by the National Science 
Foundation under grant no. ECS-8351942, the Data 
Systems Division of the International Business 
Machines Corporation, and by the Rome Air Develop¬ 
ment Center under the postdoctoral development pro¬ 
gram via Syracuse University. 
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The implementations include: 

• Graphics Kernel Standard subset 

• Convex Hull 

• Planar Graph Traversal 

• Big Rational Numbers 

• Polygon Intersection 

• Organization Inference 

They will now be described in detail. 


IMPLEMENTATIONS 


Graphics Kernel Standard Subset 

This graphics addition to Prolog was implemented 
by Nichols [Nichols85a] on an IBM 4341 using Water¬ 
log [Roberts84a], under the CMS operating system. 
This allowed us to draw lines and so on on the 3270 
graphics CRT from a Prolog program. We imple¬ 
mented two classes of lines: permanent and backtrack¬ 
able . If the Prolog procedure that drew a backtrack¬ 
able line was backtracked over, then the line would be 
erased. This used a feature of the graphics package 
GSP. 

The major problems were as follows. Waterlog, 
like most Prologs, lacks floating point numbers, and 
even four byte integers. (The latter was undocu¬ 
mented; large integers just didn’t work.) However it 
has the powerful capability to be linked to programs in 
other languages such as Fortran. Thus we imple¬ 
mented a real number in Prolog as a data structure of 
the form real(A.B) where A and B are Prolog integers 
holding the upper and lower halfword, respectively, of 
the integer. The user never looks at A and B, but 
accesses the real numbers via procedures such as 
addreal(X, Y, Z) and realtointeger(R, I) that took real 
numbers in the stated form and did the obvious things. 

Convex Hull 

This Graham Scan algorithm was implemented by 
Wu [Franklin85a] on both the IBM 4341, and on the 
Prime in Salford Prolog [Salford84a]. The Salford sys¬ 
tem allows both real numbers and dynamic linking to 
Fortran routines. We also tested York Prolog 
[Spivey83a], which is written in Pascal. The York sys¬ 
tem has the advantage that it is portable to any 
machine that can compile a thousand line Pascal pro¬ 
gram that uses four byte integers. Unfortunately this 
did not include the official Pascal compiler available 
from Prime. (We have not evaluated third-party Pas¬ 
cal compilers for Prime computers.) We also tested 
York Prolog on an IBM 3081 running the Michigan 
Terminal System, but found the other computers’ 
operating systems more flexible and cheaper to use. 

The algorithm proceeds as follows, using a divide 
and conquer paradigm. Duplicate points are removed 
and then the set of points is split into a left and a right 
subset based on the points’ X-coordinates. The convex 
hulls of these sets are found recursively. To merge the 
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two convex hulls, the top and bottom tangents or sup¬ 
porting lines are required. The first approximation to 
the top tangent is found by joining the top point of the 
left convex hull to the top point of the right one. Then 
if necessary these endpoints of the tangents are moved 
right and left until the tangent does not intersect the 
convex hulls (except at the endpoints). This algorithm 
takes time T = 6(N log(AO). 

The Prolog code is about 200 lines including com¬ 
ments. 

Boolean Combinations of Polygons 

A program to perform operations such as intersec¬ 
tion, union, and difference on two planar polygons was 
implemented by Franklin and Wu [Franklin85a] on the 
Prime and IBM 4341. The algorithm was by Franklin 
[Franklin85al. Wu first implemented a package to 
perform arithmetic using rational numbers in multiple 
precision. Each number, in life a quotient of an 
integer numerator and denominator, is implemented as 
a list of the numerator and denominator. Each of 
them is a list of groups of the digits of the number. 
For example, 123456789/987654321 is represented as 
[[56789, 1234], [54321, 9876]]. Rational numbers are 
used to avoid roundoff errors, as part of an ongoing 
investigation into their utility in geometry and the map 
overlay problem in cartography [Franklin84a]. 

The big rational package was designed in several 
steps as follows. First, rational numbers were imple¬ 
mented. A rational number Q is stored as the expres¬ 
sion N/D . This is upward compatible with integers 
since is, which knows nothing of rationals, thinks it is 
just an integer expression. This also means that the 
rational number prints normally without a separate 
print procedure. We implemented a new infix opera¬ 
tor, isr, which operates on rationals just as is operates 
on integers. It also converts integers to rationals. 
Rational versions of all the integer arithmetic operators 
were also implemented. 

Next, a big integer arithmetic package was imple¬ 
mented, along with a new infix operator are and big 
versions of all the operators. Each big integer is stored 
as a list of groups of digits. For 32 bit built-in integers, 
each group is 4 digits. Zero is stored as [ ], one as [1], 
72 as [72], 10001 as [1, 1], 2180077 as [80077, 21], 
minus one as [-1], -123456 as [-56, -1234],and so on. 

Then these were combined into one package with 
the operator isx. Now we can say things like 

X isx ([3456,12] + 23) / [222,3]. 

The big rational package was tested by calculating 
t r from the following formula, whose simplicity over¬ 
rides its very slow convergence. 

_ ? 22446688 
,r_ ^' l'3’3’5‘5'7’7'9 
The UNSW Prolog code to execute this is: 

PKD.[2])- % preset value: Pi = 2 
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step 

pi(R.P). 

R1 are R+[1], 

R2 are R1 mod [2], 

PI isx P*((R1 +R2)/(R1-R2+[1])), 
retract(pi(R,P)), 
asserta(pi(R1,P1)), 
pbfR'O.prinf: ’),pxq(P1),nl,!. 

go repeat,step,fail. 

The polygon combination system uses an edge 
based boundary representation. Each polygon is con¬ 
sidered a set of edges. Here are the actual data struc¬ 
tures. 

vert(vertex_name, x, y) 
edge(edge_name, name_of_first_vertex, 
name_of_second_vertex) 
edge_eqn(edge_name, a, b, c) 
poly(polygon_name, edge_name, which_side) 

The edge equation is of the form ax + by + c =0. 
There is one poly fact for each edge of each polygon. 
Since a given edge may be used by more than one 
polygon, it is necessary to know which side of the edge 
is the inside of this particular polygon. Legal values 
are left and right. 

With this data structure, special cases involving 
multiple edges all ending at the same vertex are not a 
problem; in fact, the algorithm never knows of their 
existence. This data structure also does not store any 
global topology, such as the number of connected com¬ 
ponents, and which are inside which other. This infor¬ 
mation, which could be calculated if needed, is in fact 
never necessary. 

The first stage of the algorithm is basically a for¬ 
ward reasoning system. It searches for cases where two 
edges intersect. Whenever this is found, those two 
edges are deleted, and three or four new edges are 
created. There will be three new edges if one edge’s 
endpoint falls on another edge. This includes the case 
where the two edges are collinear. This process contin¬ 
ues until no edges intersect, except possibly at both 
their endpoints. 

This process is a little more complicated than 
appears since we are modifying the list of edge facts as 
we are iterating through it. This is one of the areas 
where different versions of Prolog behave differently. 
One solution is as follows. 

1. Handle deletions not by actually retracting the 
edge, but by asserting a deleted(edge_name) fact to 
record the information. 

2. Initially consider all edges to be of level 0. 

3. Compare all the edges pair by pair. Whenever an 
intersection is found between two edges that do 
not have an associated deleted fact, then 

a) assert a deleted fact about both of them, and 

b) create three or four new edges by asserting 
level 1 edges. 


4. Then compare all the level 1 edges against each 
other and against all the level 0 edges without 
deleted facts. If any intersect, assert new level 2 
edges and deleted facts about the intersecting 
edges. 

5. Then compare all the level 2 edges against each 
other and against all the level 0 and 1 edges. 

6. Repeat this until no new intersections are found. 

7. Finally clean up the database. 

The above procedure should be portable since it 
does not modify any particular fact as control is iterat¬ 
ing through instances of that fact. 

Next in the boolean combination, each edge is 
classified into one of six categories: 

• an edge of polygon A that is inside polygon B, 

• on A outside B, 

• on B inside A, 

• on B outside A, 

• an edge that is on both polygons A and B, and both 
polygons are on the same side of it, and 

• on both polygons, and they are on opposite sides of 
it. 

Finally, a subset of the edges is selected depending 
on the particular result desired. For example, in a 
union, edges on either polygon that are outside the 
other polygon, plus edges on both polygons with both 
on the same side, are needed. Since this selection 
takes almost no time, all the boolean combinations are 
found at no extra cost. For example, see figure 1, 
where polygon A is ABCD and polygon B is EFGHIJ. 
After intersecting edges are cut, edges AB and EF are 
cut into AB , EB , and BF. HI is cut into HC and CL 
When the resulting edges are classified, edge AB is on 
polygon A outside of B. Edge EJ is on B inside A. 
Edge EB is on both polygons A and B, and they are on 
the same side. In contrast, edge Cl is on both 
polygons, but they are on opposite sides. 


H G 



Figure 1: Combining Polygons ABCD and EFGHIJ 
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Planar Graph Traversal 

At some point during an object space hidden sur¬ 
face algorithm [Franklin80a], we have the set of the 
visible edges and must join them to find the visible 
polygons. This requires a planar graph traversal, some¬ 
times called a tesselation. For example, in figure 2, 


4 e3 3 



Figure 2: Finding the Faces of a Planar Graph 

we are given the vertices and edges in the form 

vert(vert-name, x-coord, y-coord) 
edge(edge-name, first vertex, second vertex, 
angle) 

for example 

vert(v1,0, 0) 
edge(e1, vl, v2, 0) 

The angle of the edge is supplied because of the 
difficulty of computing arctangents using only integers. 
The output is a set of facts of the form 

polygon([v1, v2, v3, v4]) 

This was implemented in UNSW Prolog [Sammut83a] 
on a Vax. 

Organization Inference 

In this work, described in more detail in 
[Samaddar85a], we wish to infer which units of an 
army organization are present after seeing, via pho¬ 
toreconnaissance, an incomplete picture of the equip¬ 
ment they possess. The army organization, parts of 
which may be present in the photo, is described with 
Prolog facts such as the following. 

child(Father, Son, Number) 

This says that unit Father ideally contains Number of 
the subunit Son. For example, a parts of Soviet motor¬ 
ized rifle division might be defined thus: 

child(motorized_rifle_division,btr_regiment,2). 
child(motorized_rifle_division,bmp_regiment, 1). 
child(motorized_rifle_division,tank_regiment, 1). 
child(motorized_rifle_division, 
artillery_regiment, 1). 
child(bmp_regiment,bmp_battalion,3). 
child(bmp_battalion,bmp_company,3). 
child(bmp_company,bmp_platoon,3). 

The equipment that each unit possesses is 
described by the following form of fact: 

eqpmnt_overall(Unit, Ename, Number) 
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Unit is the name of the unit that owns the equipment, 
such as art_reg for an artillery regiment. Ename is the 
name of the equipment, such as sa-6 for an SA-6 anti¬ 
aircraft missle. Number is the maximum number of 
pieces of equipment that that unit can own. The fact 
for a particular unit includes only equipment that the 
unit owns directly, and not equipment owned by a 
subunit. Some sample facts are: 

eqpmnt_overall(art_reg, sa_6, 20). 

eqpmnt_overall(mr_div, amphi_brdm, 48). 

Then facts defining what equipment has been 
recognized are stated as follows: 

equipment(Name, Number) 

For example, 

equipment(sa_7, 7). 

equipment(rpg_7, 23). 

Given this information, the inference engine reports 
that 

Based on that, my first guess about the unit 
present, and the remaining equipment associated 
with it, is: 

Remaining = [[arm_per_car_btr, 38], 
[mortar_120mm_1943, 6]] 

Unit = mot_rif_btln_btr 

This inference engine is designed to be part of a 
larger blackboard format system where a low level 
image interpretation and geometry engine makes a first 
guess about the objects present and passes the informa¬ 
tion up to this unit. The output of this unit can be 
used to bias the prior probabilities of the geometry sys¬ 
tem as it continues to look. 

This system is robust since it automatically han¬ 
dles the cases of the unit on the ground being under 
strength, and the image interpretation system not 
finding everything. 


STRENGTHS AND WEAKNESSES OF PROLOG 

Certain advantages and disadvantages of Prolog 
for graphics and geometric applications are becoming 
evident from these implementations. 

Advantages Of Prolog 

• Prolog has same high level advantages of Lisp, as 
the equivalence of code and data and dynamic data 
allocation. 

• There are the specific advantages of Prolog. 
Unification makes determining graph connectivity a 
primitive operation and in general is useful for pro¬ 
pagating transitive properties such as graph connec¬ 
tivity which occur frequently. This is a counterex¬ 
ample to the proposition that, “Unification is what 
you do when you don’t know what you are doing”. 
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• The pattern matching fits with the form of expres¬ 
sion of many algorithms. For example, our polygon 
combination algoqthm proceeds as follows. When¬ 
ever the pattern of two edges intersecting, or one 
edge ends on the interior of another edge, occurs, 
then retract those edges and assert new smaller 
edges. When this pattern no longer exists, then we 
have a superset of the edges in the output polygon. 

• Although many of the above features could be 
implemented in any language that is Turing 
equivalent, Prolog is somewhat standard so that 
different researchers can understand and use each 
others’ extensions. 

Disadvantages Of Prolog 

However, there are some problems with using Prolog 

for geometry. 

• There are software engineering problems with using 
Prolog for a large project because of its lack of nest¬ 
ing in the program and databases. 

• Many geometry algorithms are more natural to a 
forward reasoning system than a backward reason¬ 
ing system. That is, we are more likely to want the 
output from some given input than the reverse. 

• The natural way of expressing pattern matching 
algorithms requires us to modify a database as we 
are searching through it. Thus in polygon overlay, 
whenever we find the pattern of two edges crossing, 
we retract them and assert four new edges. Back¬ 
tracking and redoing a database that we are modify¬ 
ing does not work on all Prologs. 

• Prolog does not support coroutines, which are a 
natural way to express many algorithms. 

• In general Prolog is completely unstandardized 
around the fringes as some tests of cuts in 
[Moss85a] show. 


PARADIGMS OF PROGRAMMING 

Certain techniques have proven to be generally 
useful in our implementations, and may be useful to 
others also. They include the following paradigms. 

Set Based Algorithms 

Many algorithms such as polyhedron intersection 
and hidden surface algorithms, Franklin [Franklin82a, 
Franklin80a], are the alternation of two types of steps: 

• Applying function to every element of a set, and 

• Combining all the elements having a common key. 
This is clearly easy in Prolog. 

Pattern Matching 

The second paradigm uses pattern matching to 
propagate certain properties. For example, in the 
planar graph traversal algorithm, the edges around each 
vertex are found and sorted by the angle at which they 
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leave it. Then the edges around each vertex are paired 
to form corners. These comers can be considered to be 
fragments of the output polygons. Whenever two frag¬ 
ments exist such that the last edge of one is the same 
as the first edge of another, then these two fragments 
are retracted and a single longer fragment asserted. 
When such a pattern no longer exists, then we have the 
output polygons. 

Unification 

Frequently we wish to determine the closure of 
some transitive property, such as when we are given a 
set of graph edges edge(u, v), and wish to determine the 
connected components. We have implemented the fol¬ 
lowing short algorithm that uses unification and the set 
processing paradigm. 


3 



Figure 3; Determining Graph Connectivity 

• Create a property list (plist) with one record per 
vertex, and the property of each vertex a free vari¬ 
able. For example in figure 3j we would have 
[[1,J,[2,J,[3,_],[4,J,[5,J,[6,J]. 

• Process the set of edges and for each edge unify the 
free variable properties of the endpoints. After this 
we will have [[1,_1], [2,_1], [3,_1], [4,_2], [5,_2], 
[6,_3]] with one unique free variable per graph com¬ 
ponent. 

• Bind a name identifying each component to the free 
variables in the list to give something like [[1 .first], 
[2,first], [3,first], [4,second], [5,second], [6,third]]. 

A longer example of a simple hidden surface algorithm 

would go as follows. 

• Wherever the pattern of two edges’ projections’ 
intersecting occurs, split the edges into four smaller 
edge segments. 

• For each edge segment find the set of faces hiding 
its midpoint. Iff it is empty then the edge segment 
is visible. Draw them. 

• Use a planar graph traversal algorithm such as 
described above to link the visible edges into 
polygons. 

• For each polygon, find a point inside it and then 
find the set of faces whose projections contain the 
projection of that point. Find the closest such face; 
the polygon came from it. Color the polygon 
accordingly. 

This illustrates all of the paradigms operating together. 
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SUMMARY 

Although not perfect, Prolog is a powerful tool for 
expressing graphical and geometry algorithms in a con¬ 
cise and natural format. This allows larger problems to 
be solved in a given time, and raises the size of the 
largest problem that it is feasible to solve. 


REFERENCES 

Bruderlin85a. 

Beat Bruderlin, “Using Prolog for Constructing 
Geometric Objects Defined by Constraints,” Euro- 
cal 85, Conference Proceedings , Linz, Austria, 
1985. Institut fur Informatik, ETH Zurich, CH- 
8092, Zurich, Switzerland 

Clocksin81a. 

W.F. Clocksin and C.S. Meliish, Programming In 
Prolog, Springer-Verlag, New York, 1981. 

Coelho80a. 

H. Coelho, J.C. Cotta, and L.M. Pereira, How to 
Solve it With Prolog, 2nd edition, Ministerio da 
Habitacao e Obras Publicas, Labatorio Nacional 
de Engenharia Civil, Lisboa, 1980. 

Franklin80a. 

Wm. Randolph Franklin, “A Linear Time Exact 
Hidden Surface Algorithm,” ACM Computer 
Graphics , vol. 14, no. 3, pp. 117-123, July 1980. 
Proceedings of SIGGRAPH’80 

Franklin82a. 

Wm. Randolph Franklin, “Efficient Polyhedron 
Intersection and Union,” Proc. Graphics Inter¬ 
faced ', pp. 73-80, Toronto, 19-21 May 1982. 

Franklin84a. 

Wm. Randolph Franklin, “Cartographic Errors 
Symptomatic of Underlying Algebra Problems,” 
Proc . International Symposium on Spatial Data 
Handling , vol. 1, pp. 190-208, Zurich, Switzerland, 
20-24 August 1984. 

Franklin85a. 

Wm. Randolph Franklin and Peter Y.F. Wu, Con¬ 
vex Hull and Polygon Intersection Implemented in 
Prolog, Rensselaer Polytechnic Institute, Troy, 
NY, July 1985. 

Gonzalez84a. 

J.C. Gonzalez, M.H. Williams, and I.E. Aitchison, 
“Evaluation of the Effectiveness of Prolog for a 
CAD Application,” IEEE Computer Graphics and 
Applications , pp. 67-75, March 1984. 

Moss85a. 

Chris Moss and Earl Fogel, Tests to Distinguish 
Various Implementations of Cut in Prolog, 
Imperial College and Logicware Inc., June 1985. 
Reported on Usenet in Net.lang.Prolog, message-id 
< 1742@utecfa.UUCP>. 


Nichols85a. 

Margaret Nichols, The Graphic Kernal System in 
Prolog, ECSE Dept., Rensselaer Polytechnic Insti¬ 
tute, Masters Thesis, Troy, NY, August 1985. 
Roberts84a. 

Grant Roberts, Waterloo Core Prolog Users 
Manual (version 1.5), Intralogic Inc., Waterloo, 
Ont, Canada, 1984. 

Salford84a. 

University of Salford, LISP/PROLOG Reference 
Manual , March 1984. 

Samaddar85a. 

Sumitro Samaddar, An Expert System for Photo 
Interpretation, ECSE Dept., Rensselaer Polytechnic 
Institute, Masters Thesis, Troy, NY, August 1985. 

Sammut83a. 

Claude Sammut, UNSW Prolog User Manual, 
University of New South Wales (Australia), 1983. 

Spivey83a. 

J. M. Spivey, University of York Portable Prolog 
System (Release 1) User’s Guide , York, U.K., 
March 1983. 

Swinson82a. 

P.S.G. Swinson, “Logic Programming: A Comput¬ 
ing Tool for the Architect of the Future,” Com¬ 
puter Aided Design , vol. 14, mo. 2, pp. 97-104, 
March 1982. 

Swinson83b. 

P.S.G. Swinson, “Prolog: A Prelude to a New 
Generation of CAAD,” Computer Aided Design , 
vol. 15, no. 6, pp. 335-343, November 1983. 

Swinson83a. 

P.S.G. Swinson, F.C.N. Periera, and A. Bijl, “A 
Fact Dependency System for the Logic Program¬ 
mer,” Computer Aided Design , vol. 15, no. 4, pp. 
235-243, July 1983. 


Graphics Interface ’86 Vision Interface ’86 



32 


THE INFERENCE MACHINE LABORATORY: GRAPHIC 
TOOLS FOR KNOWLEDGE MANAGEMENT 

J.W. Lewis, Ph.D . 

Artificial Intelligence Department 
Martin Marietta Laboratories 
1450 South Rolling Road 
Baltimore, MD 21227 


ABSTRACT 

The Inference Machine Laboratory is a collection of 
experiments in applying graphic interfaces to various types 
of knowledge bases. Each experiment involves a canonical 
representation, invertible transformations into multiple 
representations, and multiple directly manipulable views 
of those representations. Initial experiments include 
RULE*CALC (simple production rules), HAPStation 
(OPS5-like production rules), RFEX (diagnostics), and 
TIMLS (frames in PROLOG). 

KEYWORDS: expert systems, intelligent interfaces, 
knowledge acquisition, direct manipulation 


INTRODUCTION 

The major barrier to successful expert systems 
development continues to be the acquisition, review, re¬ 
structuring, and long-term maintenance of large 
knowledge bases involving complex relationships [1], 
Meaningful military and industrial expert systems are 
expected to require as many as 10,000 "rules" [2], domain 
coverage of better than 95 %, and an error rate of less 
than 0.1 %. To achieve these performance figures — 
perhaps one to two orders of magnitude beyond the 
current state of the art — the next generation of knowledge 
management tools must enable each individual involved in 
designing, building, using, and maintaining knowledge 
bases to view, understand, and manipulate their contents 
in an intuitive manner. 

For simple interactive systems, adequate tools and 
techniques are already available. The direct manipulation 
of icons has already lead to successful icon-based interfaces 
for commercial systems such as the XEROX Star and the 
Apple Macintosh (3). Research activities, such as those in 
the MIT Media Graphics Laboratory, have shown impres¬ 
sive capabilities for text and video interfaces [4]. More 
recently, tools such as UNITS [5] and GEN-X [6] have 
provided effective interfaces to knowledge bases in expert 
systems. 


THE INFERENCE MACHINE LABORATORY 

The Inference Machine Laboratory (IML) addresses 
these performance goals for expert systems by providing 
each knowledge-base user with a tailored set of directly 
manipulable views of the knowledge base and by main¬ 
taining consistency among the various views. Over the 
last two years, a family of successively more complex 
knowledge management systems has been constructed. 

The early systems have involved simple production rule 
knowledge bases, and the later systems are based on predi¬ 
cate calculus representations. 

Physically, the laboratory consists of two VAX com¬ 
puters, several LISP machines, a color monitor, a color 
video projector, several mice, a foot-mouse, a DECTALK 
voice generator, and a Polhemus 3-D graphics pointing 
device. The half-dozen individuals interacting with the 
knowledge base are grouped around a small table in front 
of the projection screen. They are able to select views, 
make queries against the knowledge base, and modify the 
knowledge base using the interactive graphics devices. 

All of the software systems in the laboratory fit into a 
common framework (Fig. 1), which supports various kinds 
of graphic input/output, logic-based knowledge represen¬ 
tations, and natural language input/output. In the IML, 
each system user works with a particular set of windows 
on the knowledge base. These windows are defined by 

o Virtual cameras - to generate shaded images, schemat¬ 
ics, tables, graphs, trees, and other diagrams 
o Views - to specify the particular image generated by 
defining the location of the user in the knowledge base, 

o Filters - to determine the granularity, or level of detail, 
in a particular view. 

Once a view is presented, the user can modify the 
knowledge base by pointing at particular elements of the 
view and indicating changes. 

RULE* CALC 

The simplest and earliest project in the laboratory is 
RULE*CALC, a VTSI-CALC-style development environ¬ 
ment for EMYCIN-class production rules with uncertainty 
[7]. The rules are laid out in a spreadsheet-like tableau. 
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Figure 1. The Inference Machine Laboratory framework 


(See Fig. 2 for a few of the rules in a 300-rule system for 
FAA radar trouble-shooting.) Each row in the tableau 
corresponds to a fact in the rules, and each column 
represents one rule. Rules are defined by entering a sym¬ 
bol in the appropriate column, which indicates how that 
fact is involved in the particular rule. A blank indicates 
that the fact is not involved in the rule at all; an = or ~= 
symbol indicates that the fact is a positive or negative 
clause in the lefthand side of the rule, and an :-T or :-F 
symbol indicates that the fact is asserted or negated when 
that particular rule fires. The user modifies the table by 
pointing to the appropriate entry with a mouse and hitting 
function keys for setting symbols in the table, 
cutting/pasting rules (rows), cutting/pasting facts 
(columns), and looking at different windows on the set of 
rules. 

Each fact defines a simple object which could have 
one or more of three associated action procedures 
(methods): on-demand (triggered when a value is 
requested), on-true (triggered when the fact is asserted 
true), and on-false (triggered when the fact is asserted 
false). If the system is run from a terminal, the methods 
type out messages on the screen and elicit responses from 
the keyboard. If the system is run by calling in on a tele¬ 
phone, the methods generate a voice over the phone and 
elicit responses from the telephone keypad. 

When the system executes, the state of the inference 
engine and the resulting dialog can be displayed in a pair 
of windows in the rule-debugging screen. One window 
shows the interactive dialog and the other displays the rule 
stack along with the facts involved in those rules. Alto¬ 
gether, in RULE*CALC there are five fixed views: the 
rule-edit tableau, fact-edit tableau, method definition, 
interactive dialog, and rule-debugging. 


HAP Station 

The second system is HAPStation, a somewhat more 
complex inference engine with a simpler interactive inter¬ 
face. HAPStation runs on a SYMBOLICS LISP 
workstation in COMMON LISP (Fig. 3). HAPS is a 
forward-chaining, production-rule-based language similar 
in style to OPS4, GRAPES, and OPS83 [8,9]. Like other 
forward-chaining production rule systems, HAPS is com¬ 
posed of two memories — a working memory (WM) and a 
production rule memory (PM) — with an accompanying 
interpreter. The working memory elements are composed 
of a sequence of terms in parentheses, e.g., "(tjiis is a 
working memory element)." The production rules are com¬ 
posed of a lefthand side (also called the LHS, antecedent, 
or IF-part) and a righthand side (also called the RHS, con¬ 
sequent, or THEN-part). The LHS of each rule is com¬ 
posed of a sequence of positive and negative clauses, which 
are in turn composed of the patterns to be matched 
against the WM. The RHS of each rule is composed of a 
sequence of action terms involving making and removing 
working elements, computing expressions, and 
input/output. 

The interpreter cycles through a recognize-act cycle 
in which it first searches for all working memory elements 
that match the clauses in the LHS of the rules. The result¬ 
ing set of rules is called the conflict set (CS), and the rule 
that fires is selected from the CS by a programmable 
conflict-resolution strategy. When the rule fires, the terms 
in its RHS are executed in sequence. This cycle continues 
indefinitely until the conflict set is* empty or a rule exe¬ 
cutes a HALT action. 

The interactive interface is composed of a set of win¬ 
dows on the memories in the production rule interpreter. 
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Figure 2. A RULE*CALC tableau 


The central window (or HAPS window) contains the nor¬ 
mal sequential dialog with the interpreter. The HAPSedit 
window enables the user to modify the rules through the 
ZMACS "smart" editor on the LISP machines. Other win¬ 
dows display the WM, the CS, and the possible matches 
with the LHS of a rule. Many of the elements of the screen 
are "mousable" so that the user can call up a pop-up menu 
of actions (run, stop, etc.) and a menu of the rule names 
against which to apply those actions. 


INTEGRATED DIAGNOSTIC SYSTEMS 

The third project area is IDS (Integrated Diagnostic 
Systems). This project is focussed on a specific applica¬ 
tion: isolating and addressing faults in complex systems. 

In particular, the project addresses intermittents, multiple 
faults, consequent faults, design errors, unusual operating 
modes, and other failures that challenge teams of experts. 
The IDS has two interfaces, one that serves the domain 
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Figure 3. A HAPStation screen 
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expert in building or applying the knowledge base and 
another that serves the end user during the troubleshoot¬ 
ing process. The expert interface shows multiple views of 
the device and of the status of the inference engine work¬ 
ing against it. For example, the RFIX system in Fig. 4 
shows six distinct windows on the robot being diagnosed, 
the robot’s schematic diagram, the dialog, and the under¬ 
lying inference engine. 

The physical window is generated by a Giroud shad¬ 
ing algorithm from a faceted surface model of the robot. 
The drive motors are shown through the translucent skin 
of the robot. The schematic view shows a block diagram 
of the LSI11 CPU and of the analog control system. The 
other windows show the current action, current question, 
and the current rule stack. In each window, the status of 
each of the elements of the robot control system is 
indicated by a color: blue to indicate state unknown but 
presumed good, green to indicate known good, and red to 
indicate a known failure. Eventually, all elements of the 
display will be independently mousable, enabling the user 
to call up parts descriptions, to explain the detailed state 
of the indicated object, and to request that tests be 
applied to the object. 


PROLOG QUERY TABLE 

The final project area is TIMLS (The Inference 
Machine Laboratory System). The focus of TIMLS is 
building and maintaining situation assessment or planning 
knowledge bases for autonomous vehicles. The current 
knowledge base is a complex PROLOG-based frame sys¬ 
tem that implements simple property inheritance and tem¬ 
poral logic using the method of temporal arguments [10]. 
The model scenario was drawn from the September 1943 
British X-craft midget submarine attack on the German 
battleship Tirpitz [11]. All events in that attack are cap¬ 
tured in a PROLOG data base. The data base includes 
more than one hundred relations of the form 
break_clear (X-craft, Barrier, Time, Flags). 
come_out_to (X-craft, Object, Time, Flags). 
come_out_from (X-craft, Enclosure, Time, Flags), 
divejnto (X-craft, Enclosure, Time, Flags) . . . 

As in the other projects, the interface provides multi¬ 
ple interactive views of the knowledge base. Two of the 
windows are color-map views of the Kaa Fjord at different 
levels of detail. In those windows, the paths and positions 



Figure 4. RFIX: The Robot Troubleshooting System 
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of the German and British ships are shown relative to ob¬ 
stacles such as submarine nets and buoys. The other win¬ 
dows contain the raw PROLOG, an English-language 
description of the weather generated from case frames, 
and the PROLOG query table (PQT). 

Like RULE*CALC, the PQT follows the VISI-CALC 
paradigm. It is a close relative of the Query By Example 
(QBE) System [12] and PROLOG implementations of it 
[13]. The recently implemented PQT consists of two 
separate windows (Fig. 5). The upper window shows the 
current constraints on subsequent queries. Each con¬ 
straint involves fixed values for one or more of the binary 
relations defined on the objects in the data base by the 
frame system. For the example given, the constraints are 
that the time is 9/22/1943 and that some number of X 
craft are in the Kaa Fjord. 


SITUATION REPORT ____ 

Partly cloudy sky today will cover the upper section of 
Norway and showers will occur. The winds will be from 
the NW at 15 to 20 knots. The temperature will reach 35 
to 4Q degrees and visibility will be moderate t o good,- 


PROLOG QUERY TABLE 


1 Constraints... 

TIME 
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LOCATION 

X ON 9/22/1943 

X-CRAFT 

KAA-FJORD 

1 Query/Response... II 

OBJECT 

SINK 

LOCATION 

TIME 

X-6 

UNDERWATER 

UNDER( 

STARBOARD( 

BOW(TIRPITZ))) 

7:30 ON 
9/22/1943 

X-7 

UNDERWATER 

UNDER(BATTLE_ 
PRACTICE. 
TARGET l) 

8:30 ON 
9/22/1943 


Figure 5. Two of the TIMLS windows 


The header of the second window is the specific query 
on the data base. Each header element is the name of 
other relations defined by the frame system. The body of 
the table is the response to the constrained query in the 
form of a list of n-tuples. The other windows echo that 
query with path(s) on the map or a new natural-language 
weather report. Many of the elements in the PQT are 
mousable so that new relations (columns) and constraints 
(upper window) can be defined, or old relations and con¬ 
straints removed. 
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ABSTRACT 

PORTRAY is an image synthesis system which uses ray 
tracing to produce realistic images of three-dimensional 
scenes. Scenes are described to PORTRAY in a high-level 
description language. The basic geometric modelling 
technique is constructive solid geometry using primitive solids 
bounded by planes and quadrics. A variety of optical 
characteristics and phenomena may be specified. The scene 
description language allows the user to define object classes 
which may be used as if they were built-in primitives. 
PORTRAY uses a number of techniques, including a novel 
technique exploiting bounding volume coherence, to improve 
its ray tracing performance. PORTRAY is supported by an 
array of image manipulation tools which share a common 
image storage format. 

KEYWORDS: bounding volume coherence, constructive 
solid geometry, illumination models, image synthesis, ray 
tracing. 

1. Introduction 

PORTRAY is an image synthesis system which generates 
realistic pictures of three-dimensional scenes. Scenes are 
described to PORTRAY using a high-level scene description 
language (SDL). The scene description is processed by a 
scene compiler called “PRCOMP”. PRCOMP produces a 
file which describes the scene in a lower-level language. This 
intermediate file is read by the rendering program, simply 
called “PORTRAY”, which uses ray tracing to produce an 
image of the scene. Figure 1 illustrates the structure of the 
system. 



Figure 1 
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The PORTRAY image synthesis system is implemented in the 
C programming language, and runs under UNIX on VAX, 
Sun, and Pyramid computers. 

This paper describes the geometric and optical modelling 
techniques used in PORTRAY, the scene description 
language, the processes of scene compilation and rendering, 
and the image format and image manipulation tools used with 
PORTRAY. 

2. Scene Description Language 

The SDL is a critical part of the PORTRAY system, since it 
determines the ease with which a user of the system can 
model the objects he wants in a given scene. With a particular 
SDL, some scenes may be impossible to describe, and many 
more scenes may be impractical to describe. The PORTRAY 
SDL incorporates a powerful geometric modelling technique 
and a variety of optical modelling techniques to give the user 
a large amount of descriptive power [1]. 

2.1 Geometric Modelling 

The shapes of objects are described to PORTRAY by means 
of constructive solid geometry (CSG). CSG descriptions are 
expressions involving Boolean combinations of primitive 
solids. PORTRAY uses spheres, cones, cylinders, and cubes 
as primitive solids. These primitives have quadric and planar 
surfaces which make the problem of determining the 
intersection between a ray and the primitive quite simple. 
Primitives may be moved to arbitrary locations and rotated 
and scaled as desired. Unequal scaling may be used, for 
example, to turn a sphere into an ellipsoid, or a cube into an 
arbitrary rectangular block. 

Primitive solids are combined using regularized set operators 
[2], namely union, intersection, and subtraction. The 
combination of two solids is guaranteed to be another well- 
defined solid. Any two CSG expressions can be combined 
using any of the three operators to obtain another CSG 
expression. Quite elaborate objects may easily be constructed 
in this way (see Figure 2). 

The use of solid modelling instead of surface modelling 
entails some additional cost. To ray trace a solid consisting of 
a combination of primitive solids, we must first find the 
intersections between a given ray and each of the primitive 
solids. The resulting lists of intersections are then merged in a 
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way which depends on the semantics of the CSG operator. In 
general, we must find all intersections between a ray and the 
object, even though we only use the intersection nearest to the 
eye in most cases. This is because more distant intersections 
may affect the outcome of a merge. Ray tracing of surface 
models usually only requires that we find the nearest 
intersection between the ray and each surface. However, the 
additional cost of ray tracing solid models is justified by the 
geometric modelling power and simplicity of CSG. Solid 
modelling is also advantageous for simulating optical 
phenomena which involve light passing through the body of a 
solid. 

Although the three CSG operators are inherently binary 
operators which take exactly two operand expressions, the 
PORTRAY SDL allows the use of “m-ary” (associative) 
union and intersection operations as a notational convenience. 
This reduces the need for the user to fully parenthesize 
complex descriptions to explicitly indicate the operands of 
every operator. 

There is nothing sacred about the particular set of four 
primitive solids which are used by PORTRAY. In fact, the 
programs are designed so that a new primitive can be added 
simply by programming routines to find intersections, normal 
vectors, and texture coordinates for the new primitive, and by 
adding a line to a single table which links the new routines 
into the rest of the system. In future, primitives may be added 
to PORTRAY to more easily describe irregular, natural 
objects. CSG with simple quadric primitives is an effective 
means of describing most man-made objects, but the great 
complexity of natural objects would be modelled better by 
primitives such as fractal surfaces. 

2.2 Optical Modelling 

Shape is only one aspect of an object which must be modelled 
by an image synthesis system. We use the term “optical 
modelling* * to refer to the other attributes of an object (color, 
texture, reflectance, transmittance, etc.) which determine how 
a ray of light interacts with the object. PORTRAY provides a 
wide variety of optical modelling techniques. 

One of the most basic optical characteristics is color. The 
PORTRAY user may specify colors using an English-like 
scheme which is a simplified version of the color naming 
system (CNS) described in [3], or more precisely in terms of 
hue, saturation, and value. Both the CNS and HSV color 
models are generally believed to be easier for people to use 
than the RGB color model which PORTRAY uses for its 
internal computations. 

The “shininess” attribute is used to specify the reflectance of 
a surface. The user may specify shininess in a pseudo-English 
fashion, using the keywords SHINIEST, SHINIER, SHINY, 
DULL, DULLER, and DULLEST. Alternatively, the 
reflectance of a surface may be specified numerically. The 
“smoothness** attribute controls the size of specular 
highlights which appear on a surface. A shiny smooth surface 
has smaller, sharper highlights than a shiny rough surface. 
The optical model simulates smoothness variations by 
controlling the parameter m in the Beckman function that 
determines the directional distribution of surface microfacets 
in the Cook-Torrance reflection model [4]. Extremely smooth 


Graphics Interface ’86 


surfaces (specified as SMOOTHEST by the user) have ray 
traced reflections and transmissions. 

Transmission and refraction of light through objects is 
controlled by the “transparency** attribute. The user may 
specify the index of refraction and a light scattering factor for 
each transparent object The light scattering factor is 
expressed as a distance which a light ray would have to travel 
through the material in order to be reduced to one-half its 
original brightness. The relative contribution of the reflected 
and transmitted rays at an interface between transparent 
materials is determined using the Fresnel equations of 
physical optics. Figure 3 is an image of a scene containing a 
wine glass, which illustrates the use of reflection and 
refraction. If the index of refraction is specified as FAKE, 
PORTRAY allows rays striking the surface to be transmitted 
straight through without refraction. 

The “pure” attribute is used to distinguish between composite 
materials such as plastics, where the color of highlights 
depends only on the color of incident light, and pure materials 
such as metals, where the color of highlights is influenced by 
the body color of the material. 

The “paint** attribute is used to specify a variety of texturing 
techniques. For each texture, the user specifies a built-in 
texture function and a texture color. PORTRAY interpolates 
between the normal object color and the paint color according 
to the texture function. Some of the texture functions make 
use of disk files of textural information. In such cases, the 
user specifies the texture file by name. Texture files are stored 
in the same format as the images produced by PORTRAY (see 
section 5) and PORTRAY may use several texture files during 
the rendering of a single scene. The scene depicted in Figure 
4 makes use of nine texture functions and seven different 
texture files. 

PORTRAY has been used to experiment with a new texturing 
technique called “solid texturing** [5]. Solid texture 
functions proved to be easy to add to the library of built-in 
texture functions. The left and right spindles in Figure 5 show 
the application of two of these solid texture functions. 

2.3 SDL Statements 

A scene description in PORTRAY SDL consists of a sequence 
of statements. There are several types of statements, which 
are described in the following paragraphs. 

CAMERA, TARGET, and FOCAL LENGTH statements 
specify the position, orientation, and focal length of the 
camera which is simulated by ray tracing. Since 35 mm 
cameras are popular and familiar, the simulated camera is 
designed so that the focal lengths of the lenses used with a 35 
mm camera can be used in the FOCAL LENGTH statement to 
obtain similar effects. 

OBJECT statements specify the CSG expressions and optical 
attributes which describe the objects in the scene. 

LIGHT, AMBIENT, and BACKGROUND statements specify 
the intensity, color, position, and type of light sources, and the 
color of the infinitely large background sphere which 
surrounds the scene. A direct light source may be specified as 
a point source, a beam of parallel rays from a given direction, 
or a focused spotlight with a particular concentration, 


Vision interface ’86 




siiiisi 






Note: All images were originally in color, 


Vision Interface ’86 


Graphics Interface ’88 









40 


direction, and solid angle [6]. The user may also indicate 
whether or not each light source should cast shadows. 

The FOG statement specifies atmospheric attenuation of light 
rays. Rays are faded toward the background color as an 
exponential function of the distance the ray travels. The FOG 
statement can be used to simulate day or night fog, underwater 
conditions, haze, or aerial perspective, depending on the 
background color and fog density. The image shown in 
Figure 6 was made using the FOG statement with a light gray 
background color. This image is otherwise identical to Figure 
2 . 

The INCLUDE statement allows scene descriptions to be 
broken up into several files, with a main description including 
sub-files as necessary. This makes management of complex 
scenes easier, particularly in cases like animation, where 
object descriptions may be the same from scene to scene, with 
only the positions and orientations changed. 

The CLASS statement allows a particular object description to 
be given a class name. Instances of the class may then be 
used as if they were built-in primitives. Stating it differently, 
the primitives are really built-in classes. A class instance may 
be scaled, rotated, positioned, colored, textured, etc. and may 
also be used as part of another class or object description. 

The following is the SDL description used to produce the 
image in Figure 3: 

I* Description of the wine glass ... 7 
Object is (smoothest shiniest trans(1000,1.65) 
white cone at (0,100,0) scale (60.5,200,200) 
rotate (0,0,-90) + 

smoothest shiniest trans(1000,1.65) 
white cylinder at (0,0,0) scale (40.5,200,200) 
rotate(0,0,90)) + 

(smoothest shiniest trans(1000,1.65) 
white cylinder at (0,350,0) scale (300,20,20) 
rotate (0,0,-90) + 

(smoothest shiniest trans(1000,1.65) 

white cone at (0,300,0) 

scale (400,300,300) rotate (0,0,90) - 

smoothest shiniest trans(1000,1.65) 

white cone at (0,320,0) 

scale (400,300,300) rotate (0,0,90))) 

r ... and of the wine itself... 7 
Object is smoothest shiniest trans(1000,1.36) red cone 
at (0,320,0) scale (280,210,210) rotate (0,0,90) 

r Description of the backdrop... 7 

Object is white block at (0,-20,0) scale (2000,20,2000) 

Object is white block at (0,2000,-2000) scale (2000,2000,20) 

I* Lighting and camera parameters... 7 
Light spot (1,62) intensity 0.8 white 

at (0,1000,-100) toward (0,0,-2000) 

Ambient intensity 0.30 white 
Background light gray 

Camera at (0,1500,4000) target at (0,400,0) 

3. Scene Compilation 

The PRCOMP program uses a LALR(l) parser, generated by 
the UNIX YACC utility, to parse the SDL description and 
build internal CSG expression trees for object and class 
descriptions. All instances of user-defined classes are 
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expanded by substituting the class definition in place of the 
instance. The intermediate file output by PRCOMP is entirely 
in terms of built-in primitives. PRCOMP also expands m-ary 
union and intersection operators into balanced trees of binary 
union and intersection operators. Analysis shows that ray 
tracing these balanced binary trees is considerably more 
efficient than ray tracing the m-ary operators directly 
(logarithmic versus linear’time complexity). 

PRCOMP uses the location, rotation, and scaling information 
provided in the SDL description to generate a transformation 
matrix for each primitive (leaf node) in the output CSG 
description. PORTRAY uses this matrix to transform rays 
from the scene coordinate system to the local coordinate 
system of a particular primitive during intersection 
calculations (see section 4). PRCOMP also generates the 
inverse of this matrix, which is used to transform normal 
vectors from the local coordinate system to the scene 
coordinate system. 

At its discretion, PRCOMP may generate a bounding volume 
for a given primitive instance or CSG expression subtree. 
PORTRAY uses such bounding volumes in order to make 
quick comparisons between a ray and a CSG expression so 
that detailed intersection calculations need not be done for 
rays which obviously do not pass near the object described by 
the expression. At present, the bounding volumes are boxes 
aligned with the axes of the scene coordinate system, but 
some experimentation with other bounding volumes [7] is 
planned. 

4. Rendering 

PORTRAY renders an image of a compiled scene description 
by tracing a ray from each image pixel to the scene [8]. These 
primary rays may generate subsidiary rays upon striking a 
surface which reflects and/or refracts the ray. This process 
may continue recursively to produce a tree of rays, whose 
depth is controlled by an adaptive scheme [9] and by a 
“hard” depth limit. Additional shadow rays are traced from 
each surface intersection to each shadow-casting light source, 
in order to determine whether or not light from the source 
reaches the intersection point on the surface. 

Anti-aliasing is performed by adaptively supersampling when 
adjacent pixel values differ sharply [8]. This anti-aliasing 
technique may overlook very small details that * ‘fall between 
the cracks”, but is much less expensive than supersampling 
throughout the image. Recently introduced stochastic 
sampling techniques [10,11] are being considered as an 
alternative anti-aliasing scheme for PORTRAY. 

Ray tracing a CSG expression [12] involves a postorder 
traversal of the CSG expression tree. At each internal 
(operator) node of the tree, lists of intersections from the left 
and right subtrees are merged according to the semantics of 
the operator (union, intersection, or subtraction). When the 
root of a particular expression tree (object description) is 
reached, the intersection nearest the ray origin is chosen as the 
intersection between the ray and the object. In some cases, for 
example, when FAKE (non-refractive) transparency is 
specified, PORTRAY uses subsequent intersections in the 
intersection list to render an object. The earth hologram 
display in Figure 4 is one application of fake transparency. 
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As mentioned earlier, intersections between a ray and a 
primitive are performed by transforming the ray from scene 
space to a local coordinate system in which the primitive has a 
simple, canonical form. For example, the spherical primitive 
is always a unit sphere centered at (0,0,0) in its local 
coordinate system, even though it might be positioned at 
(1000,2000,3000) and stretched into an ellipsoid by unequal 
scaling in the SDL description. It is straightforward to convert 
a ray into the local coordinate system using the transformation 
matrix supplied by PRCOMP. However, it is more difficult to 
convert the surface normal vector at the intersection point 
from the local coordinate system back to the scene coordinate 
system. The problem arises because angles are not preserved 
by unequal scaling, so that a vector perpendicular to the 
surface in the local coordinate system may no longer be 
perpendicular to the surface after the transformation. 
PORTRAY avoids this problem by generating three points in 
the plane tangent to the surface, transforming these points 
from the local coordinate system to the scene coordinate 
system, and then reconstructing the tangent plane and the 
normal vector in the scene coordinate system. 

PORTRAY uses a number of techniques to improve the 
performance of the rendering process. In its simplest form, 
ray tracing is very much a “brute force” technique, since it 
exhaustively computes all intersections between every ray and 
every object in the scene. PRCOMP computes a bounding 
rectangle in image space for each object, so that PORTRAY 
knows which pixels may contain a direct image of a given 
object. PORTRAY then uses the bounding rectangles to 
efficiently determine which objects can be intersected by a 
primary ray from a given pixel. Other objects are excluded 
from consideration during the tracing of that primary ray. 

The benefits of the bounding rectangles are limited to primary 
rays. PORTRAY also uses bounding boxes generated by 
PRCOMP to quickly exclude objects from consideration in 
tracing any given ray. If a ray does not intersect the bounding 
box of an object, then the ray cannot intersect the object at all. 
Checking a ray against a bounding box is only slightly faster 
than generating the intersections between a ray and a 
primitive. However, bounding boxes really pay off for objects 
with complex CSG descriptions. A single bounding box test 
may exclude from consideration an entire tree or subtree, thus 
saving dozens or hundreds of primitive intersection 
calculations. 

Testing a ray against a bounding box is fruitful only if the test 
proves negative and the object within the box is excluded 
from further consideration. If the bounding box test is 
positive, the program must go on to intersect the ray with the 
object, so the box test is a wasted operation. (This is why it is 
desirable for the bounding volume to fit the object as tightly as 
possible [7].) PORTRAY exploits a property which we call 
bounding volume coherence to reduce the number of positive 
bounding box tests. Bounding volume coherence is based on 
the observation that rays traced from adjacent pixels follow 
similar paths, even down through subsidiary levels of the ray 
tree. Thus, there is a high probability that a bounding box test 
which was positive for the previous ray tree will be positive 
for the current ray tree. When a bounding box tests positive, 
PORTRAY flags it with a value indicating the ray tree 
position of the ray being traced. On the next ray, flagged 
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bounding boxes are assumed to test positive at the same 
position in the ray tree, and the bounding box test is not 
performed. There is a performance penalty if the assumption 
is false, but the appearance of the image is unaffected. 

PORTRAY ray traces about 10% faster when bounding 
volume coherence is used. This is particularly interesting, 
since an attempt to make more general use of ray coherence, 
reported in [13], indicated that no performance benefit was 
obtained. 

PORTRAY generated the image in Figure 3 at 512x512 
resolution in 70 minutes on a Pyramid 90x, tracing a total of 
606 thousand rays. The more complex image in Figure 4 was 
rendered at the same resolution in 324 minutes. Fewer rays 
(502 thousand) were traced in this case, because fewer pixels 
contained reflective or refractive objects. 

5. Image Format and Tools 

At a conceptual level, PORTRAY images are rectangular 
arrays of pixels. Each image consists of several UNIX files, 
including an image description file (IDF). The IDF is a file of 
ASCD text which describes the image and each of the other 
files which form part of the image. Table 1 lists the various 
files which may exist as part of an image. A particular image 
need not contain all of these files. The height, width, and 
depth of each of the image data files is described in the IDF; 
the data files themselves contain only the pixel intensity 
information. Data files may optionally be run-length encoded 
to reduce storage cost. 

The multiple-file image representation was chosen to provide 
a high degree of flexibility in the manipulation of image data. 
For example, an RGB image with 24 bits per pixel (bpp) 
would be stored in three separate files, each with 8 bpp. The 
red, green, and blue data could be accessed separately or as a 
single RGB image. The format of the IDF reinforces this 
flexibility, since the IDF can be modified with an ordinary text 
editor when special handling is needed. Thus, it is not always 
necessary to build a new image manipulation tool, even when 
unforeseen needs arise. 


Table 1: 

Image File Types 

Filename Suffix 

Description of Contents 

idf 

image description file 

red . 

red image data 

gm 

green image data 

blu 

blue image data 

cvg 

pixel coverage data [15] 

gry 

gray-scale image data 

lut 

color lookup table (LUT) 

enc 

image encoded for LUT 

log 

history of image (text) 


Several tools have been developed to process PORTRAY 
images, including the following: 

• iencode , which generates an 8 bpp image using a given 
color lookup table, from a 24 bpp RGB image, using the 
algorithm described by Heckbert [14]. 
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• ilut y which generates a color lookup table containing 
colors which are appropriate for displaying a given 24 bpp 
RGB image. The lookup table is generated from a color 
histogram of the RGB image, using the “median cut” 
color space subdivision algorithm, also described in [14]. 
ilut and iencode are used to prepare a 24 bpp image for 
display on an 8 bpp color graphics system, such as an 
AED terminal or a Sun workstation. PORTRAY generates 
all images in 24 bpp RGB form. 

• icomp , which combines images according to a specified 
composition operator [15]. Figure 4 is an example of a 
composite image produced using icomp. The starfield 
visible through the space station windows was separately 
generated, and then composited with the PORTRAY 
image of the space station and planet 

• traditional image processing algorithms, including 
histogram equalization and gamma correction, are used to 
process texture images and to adjust contrast, brightness, 
and color of images prior to photographing them with a 
film recorder. 

6. Conclusions 

PORTRAY is a flexible image synthesis system which derives 
much of its power from a high-level scene description 

language. Constructive solid geometry is an excellent, easy- 
to-use geometric modelling technique for man-made objects, 
but is less appropriate for natural objects. Ray tracing is an 
expensive rendering technique, but is well suited to CSG 
models and is capable of simulating a wide variety of optical 
phenomena. PORTRAY incorporates various techniques for 
speeding up the ray tracing of CSG models, including a novel 
technique for exploiting bounding volume coherence. To 
further speed up ray tracing, we are planning to experiment 
with parallel ray tracing algorithms on an experimental 16- 
processor INMOS Transputer system, being constructed at the 
University of Saskatchewan. 
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ABSTRACT 

This paper presents an adaptive subdivision 
algorithm for fast ray tracing implemented on 
parallel architecture using a three dimensional 
computer array. The object space is divided into 
several subregions and boundary surfaces for the 
subregions are adaptively slid to redistribute 
loads of the computers uniformly. Since the shape 
of the subregions is preserved as orthogonal 
parallelepiped the redistribution overhead can be 
kept small. The algorithm is quite simple but can 
avoid load concentration to a particular computer. 

Simulation results reveal that the adaptive 
space subdivision algorithm by sliding boundary 
surfaces reduces the computing time to 3/4-1/5 as 
much as that for the conventional space subdivision 
algorithm with no redistribution, which reduces the 
computing time almost proportionally to the number 
of the computers. 

KEYWORDS: sliding, adaptive, parallel, ray tracing, 
subdivision, boundary. 


Introduction 

Among general image synthesis methods 
available today, ray tracing 1 is probably the most 
realistic technique, because it models a wide range 
of natural phenomena. However, it requires a large 
amount of computing time. The calculation for ray- 
object intersections requires 75-95 percent of the 
total computing time 1 . 

Various approaches have been attempted toward 
speeding up of ray tracing. Previous research 
reports are categorized as follows: 

(1) Multicomputer system by image 
subdivision 2 : An image to be generated is divided 
into several subimages and each of the computers 
generates one or more subimages independently. 

(2) Vectorization^: Since the ray-object 
intersection calculations belonging to the 
different pixels and the intensities of the 
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different pixels are calculated completely 
independently, the calculations can be vectorized. 

(3) Space subdivision 2 *»^: The three 
dimensional space of a scene to be rendered is 
divided into subregions. The rays which are cast 
into each subregion are tested for intersection 
with the objects contained within the subregion. 
Data on rays that exit a subregion are passed to 
the appropriate neighbor. 

The first two ways do not reduce the number of 
ray-object intersection calculations, but speed up 
the intersection process itself, by parallel 
processing and specialized hardware. 

On the other hand, the space subdivision 
method can reduce the number of calculations, 
because it tests rays for intersection only with 
the objects contained within the subregions that 
rays pass through, instead of all objects in the 
entire scene. 

Recent work has applied a parallel 
architecture to this space subdivision algorithm 1 *. 
This architecture uses a three dimensional computer 
array, each computer of which is assigned to one or 
more subregions. The shapes of the subregions are 
"general cubes", which are general hexahedron, and 
the shapes are adaptively controlled to realize a 
roughly uniform load distribution. This algorithm 
has the following problems: 

1) Load is transferred among the subregions by 
moving corners of a general cube indicating the 
subregion. The moving operation to distribute the 
load is quite difficult because moving one corner 
affects the loads of the eight subregions holding 
it in common. The problem is how to select the 
corner to be moved and how to determine the 
direction and the length to move the corner in a 
three dimensional space in order to distribute the 
loads of the eight subregions at once. 

2) When rays exit the subregion, the 
neighboring subregion that rays are passed to is 
determined by boundary-intersection calculation. 
However, boundary-intersection testing for general 
cubes is a significant overhead. 

3) When a corner of the subregion is moved, an 
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appropriate part of the object descriptions 
contained within the subregion are determined and 
pertinent data is passed to the neighboring 
subregions. This operation is as expensive as 
boundary-intersection testing. 

A moving corner method would greatly affect 
the performance of this algorithm. However, the 
problem how to move the corner to distribute the 
load cannot be easily solved because of the 
difficulties mentioned above. 

This paper presents a new approach to solve 
the problems above. The shapes of the subregions 
are limited to Orthogonal parallelepipeds, and load 
is transferred by sliding boundary surfaces of the 
subregions. 

The following sections will discuss a simple 
subdivision algorithm for parallel architecture and 
give simulation results. 


New Space Subdivision Algorithm 

The essential algorithm characteristics are: 

1) The three dimensional space of a scene to 
be rendered is divided into several subregions by 
planes perpendicular to a coordinate axis( Fig.1 ). 
Positions x if y^, and of the dividing planes 
have integer coordinate values. Thus, each 
subregion is an orthogonal parallelepiped which 
consists of several unit cubes. A unit cube is a 
cube whose size is 1 and whose edges are parallel 
to each coordinate axis. 

2) Each computer of three dimensional 
computer array is assigned to one subregion and 
maintains only the object descriptions contained 
within that subregion. 

3) Each computer has 6 connections to 
neighboring computers in order to pass messages 


z 



( Fig.2 ), which consist of information about rays 
and redistribution. Each computer also has a 
direct connection to a host computer and to a frame 
buffer. Messages regarding object descriptions are 
directly sent from a host computer to all computers 
as broadcast messages. Each computer determines 
which object is contained within its own subregion 
and preserves only its description. 

4) Initial rays from the eye point are created 
by all computers in parallel. An image to be 
generated is divided into subimages and each 
computer is assigned to one of the subimages. Each 
computer creates the initial rays which pass 
through its own subimage. Then each initial ray is 
transferred to the appropriate subregion where the 
initial ray starts. After each of the rays has 
reached to the appropriate subregion, it is tested 
for intersection with those objects within the 
subregion. 

5) Rays that exit the subregion are passed to 
neighboring subregions via connection between 
computers. The three dimensional digital line^ is 
generated for efficient tracing of rays. The three 
dimensional digital line is an array of unit cubes 
pierced by the ray ( Fig.3 ). 

To determine the the array of unit cubes, two 
DDAs ( Digital Differential Analyzers generate 
two digital lines synchronously which are the 
projections of the three dimensional digital line 
to two coordinate planes. The error values of two 
DDAs are compared to decide the correct direction 
of the three dimensional digital line ( Fig.4 ). 

Since the subregion consists of several unit 
cubes and the three dimensional digital line is 
generated only by addition of integer values, rays 
can rapidly traverse the subregion and also enter 
the appropriate neighboring subregion by using the 
three dimensional digital line. 

6) For redistribution, the boundary surface 
between two subregions is slid by one unit and a 



Figure 2. 6 connections from a computer 
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part of the load for one subregion is transferred 
to the other subregion. The following section gives 
more details about sliding boundary surfaces. 


Sliding Boundary Surfaces 

The adaptive subdivision by sliding boundary 
surfaces is as follows: 

1) At the beginning, one of three coordinates 
axes is set as a driving axis ( e.g. x axis in 
Fig.5 ). Boundary surfaces for the subregion 
perpendicular to the driving axis are slid by one 
unit along the driving axis to transfer the load 
from one subregion to a neighboring subregion. 

The subregion load is related to the number of 
the objects contained within the subregion. 



Figure 3. Three dimensional digital line 



Figure 4. Error values of two DDAs 


Therefore, the axis along which the numbers of the 
objects in the subregions are most varied is set 
as a driving axis. 

2) Each computer counts the running time while 
the computer actually processes the rays. Each 
computer also counts the waiting time while the 
computer has no ray to be processed and is waiting 
the rays passed from the neighboring computers. 
The ratio (running time) / (waiting time) is 
defined as a parameter to indicate the load for the 
subregion. 

3) For redistribution, loads for two 
subregions on both sides of the boundary surface 
are compared by the computers assigned to these two 
subregions at intervals of the given time . If the 
load for one subregion is lower than that for the 
other subregion and the lower load is under the 
given threshold value, the boundary surface is slid 


1 unit A boundary surface 



Figure 5. Sliding a boundary surface 



-> 

Driving axis 

Figure 6. Boundary surfaces discrepancy 

( Two dimensional view ) 
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by one unit along the driving axis from the lower 
load subregion to the higher load subregion. By 
this operation, a part of the subregion with the 
higher load ( called the transferred region ) is 
cut away and added to the lower one. 
Simultaneously, the object descriptions contained 
within the transferred region are transferred. 
Thus, the load is simply and efficiently 
transferred. These operations are locally executed 
by the computers assigned to these two subregions. 

4) As each boundary surface is slid for the 
redistribution, some discrepancy in the boundary 
surface occurs ( Fig.6 ). However, since the 
connections between computers are fixed, there 
could be a case wherein a computer has no direct 
connection to another computer assigned to the 
neighboring subregion. In this case, data 
concerning the rays that exit the subregion cannot 
be directly passed to the appropriate computer 
assigned to the appropriate subregion, so that 
data on rays are passed to the direct connected 
computer which is assigned to the subregion located 
on the same driving axis with the appropriate 
subregion ( shown by a dotted line in Fig.6 ). 
After that, data on rays are passed along the 
driving axis where they can finally reach the 
appropriate computer. 

Essential characteristics of the method are as 
follows: 

1) For redistribution, only the loads for two 
subregions on both sides of the boundary surface 
perpendicular to the driving axis are compared. 
Thus, the redistribution between the subregions is 
easily determined. 

2) The shape of the subregions is preserved as 
an orthogonal parallelepiped. Since boundary 
surfaces are rectangular and perpendicular to 
coordinate axes, boundary-intersection testing is a 
small overhead. 

3) Only the boundary surfaces perpendicular to 
the fixed driving axis are slid along the axis by 
one unit and the object descriptions contained 
within that transferred region are transferred. So 
the redistribution does not cause a significant 
overhead. 

4) Since the given threshold value stops the 
sliding between the highly loaded subregions, a 
thrashing whereby the object descriptions are 
repeatedly moved between the highly loaded 
computers is avoided. A thrashing between the 
lightly loaded subregions does not matter to the 
total performance. 

5) Because of the simplicity of this method, 
it can be easily implemented on a three dimensional 
computer array. 


Results 

The proposed adaptive subdivision algorithm by 
sliding boundary surfaces is simulated to evaluate 
the redistribution effect. The simulation results 
for redistribution are compared with that for no 
redistribution when the load is concentrated to 
some particular computers. 

Beforehand, the effect of the conventional 
space subdivision algorithm without redistribution 
is evaluated by simulation. 

A.. S Amu l aLton M-eiJtoda 

The algorithms have been written in a C 
program and tested on a Vax-11/780 under the Unix 
operating system. A parallel process simulator^ 
has been created to evaluate the algorithms. The 
simulator virtually causes the computers to run in 
parallel and counts the running time and the 
waiting time of each computer to calculate the 
load. The simulator also counts the computing time 
for generating an image by parallel architecture. 

Objects are only spheres. These spheres are 
described by their center position, radius, color, 
and reflecting parameters and these parameters are 
generated as uniform random numbers. Therefore, the 
objects are located in a space at random. 

iL_ Space Subdivision Bfltefifc 

First, the effect of the conventional space 
subdivision algorithm is evaluated. 

Figure 7 shows the computing time of the space 
subdivision algorithm without redistribution. 

Result A shows the computing time when only a 
single computer is assigned to all subregions. The 
algorithm implemented on a single computer reduces 
the computing time on the order of ( s = the 
number of the subregions ). 

Result B shows the computing time when each 
computer of three dimensional computer array is 
assigned to one subregion but initial rays are 
created by the computer whose subregion contains 
the eye point. The difference between results A 
and B means the parallel processing effect using 
the three dimensional computer array. 

Result C shows the computing time when the 
initial rays are also created by all computers in 
parallel. This space subdivision method with the 
parallel initial ray creation reduces the computing 
time on the order of S^*^. The difference between 
results B and C means the parallel creation effect 
of the initial rays. The parallel initial ray 
creation is effective when S is large. 

Figure 8 shows the computing time when the 
number of objects is changed in the case of result 
C in Fig. 7. The computing time can be reduces on 
almost the same order even if the number of the 
objects is changed. 
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Proposed Redistribution Pff.ejgjL 

In order to examine the redistribution effect, 
the region where all objects are located is reduced 
from the entire space to quarter of the entire 
space. The smaller region the objects are located 
in, the more load is concentrated to some 
particular subregions. 

Figures 9 and 10 show the comparisons between 
the computing time for redistribution by sliding 
boundary surfaces and that for no redistribution. 
The horizontal axes of Figs. 9 and 10 means the 
volume ratio of the region where all objects are 
located against the entire space. 

When the objects are uniformly located in a 
space at random, the loads have been almost 
uniformly distributed initially so that the 
redistribution does not work so effectively. Even 
so, the redistribution can reduce the computing 
time to 3/4 as much as that for no redistribution. 

The more concentrated the objects and the 
loads are to a part of the subregions, the greater 
the redistribution effect becomes. The effect 
becomes up to 1/5 when the objects are concentrated 
to the quarter of the entire space, these results 
mean that the redistribution by sliding boundary 
surfaces has an equivalent effect to distribute the 
concentrated objects to the entire space. 

Moreover, Figs. 9 and 10 show almost same 
redistribution effect, so that not the number of 
the objects but the objects location in the space 
controls the redistribution effect. 


Conclusions 

This paper has presented a simple adaptive 
subdivision algorithm implemented on the parallel 
architecture using a three dimensional computer 
array. Boundary surfaces of the subregions are 
adaptively slid to redistribute loads of the 
computers uniformly. Since the shape of the 
subregions is preserved as orthogonal 
parallelepiped the redistribution overhead can be 
kept small. By using this algorithm, the computing 
time is reduced as much as 3/4-1/5 of that for no 
redistribution. 
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ABSTRACT 

Graphics systems using three dimensional models, and 
computing a colour shaded image for a raster display are 
very common, and range widely in performance and cost. 
Despite the numerous variations in rendering techniques, 
visibility determinations, illumination models and model- 
ling primitives manipulated, it is important to be able to 
compare them when rendering similar scenes. 

We present here the first results of a series of profiling of 
different rendering systems displaying the same scenes on 
the same machine. The systems studied arc a ray-caster, a 
system using a depth-buffer for visibility determination, 
and a system using a scan-line Watkins algorithm. The first 
and last systems have an antialiasing option. Two types of 
scenes were used, one made of a constant number of 
polygons varying in size, and the other made of parametric 
surfaces varying in level of subdivision. 

The results, mainly useful for relative comparisons, 
confirm some predicted behaviours. The depth-buffer algo¬ 
rithm degrades considerably when the depth complexity 
increases. The ray-caster is not much influenced by the 
number of polygons, but by the total number of pixels 
covered. The most striking result is the large proportion of 
time spent on shading. It is a strong indication that work 
on ways to make shading computations less expensive, and 
to design special hardware for that purpose would be fruit¬ 
ful. 

KEYWORDS: display systems, rendering techniques, 
profiling, shading, visibility determinations. 

RESUME 

Les systemes graphiques qui utilisent des modelcs a trois 
dimensions et qui produisent des images ombrees en 
coulcur pour des alfichagcs rasters soul mainlenant ties 
repandus et different enormement en puissance et en coat. 
Hn depit des grandes variations dans les techniques de 
determination de visibility les techniques dombrage et les 
techniques de modelage qtfils utilisent. il est important de 
pouvoir comparer leurs performances quand ils rendent la 
meme scene. 


Nous presentons ici les premiers resullals d'une serie de 
profilage de differenls systCanes d'alfichage produisant les 
memes scenes stir la meme machine. Les sylemes etudies 
sont tin lanceur de rayon , tin systeme utilisant une memoire 
de profondcur pour determiner la visibilitc, et un systeme 
utilisant falgorithme de Watkins avec la conversion en 
ligne de balayage. Le premier et le dernier systeme ont 
tous les deux une option d' antialiasing . Deux genres de 
scenes ont cte utilisees. Un etait fait dTin nombre constant 
de polygones dont seule la taille changeait, et fautre de 
surfaces parametriques a des niveaux varies de subdivision. 
Les resultats, surtout utiles pour des comparaisons rela¬ 
tives, confirment bcaticoup de previsions. La performance 
de falgorithme de memoire de profondeur sc degrade de 
facon considerable quand augmente la complexity de pro¬ 
fondeur. Le lanceur de rayon n'est pas tres influence par 
le nombre de polygones, mais plutot par le nombre total 
de pixels rccouvertes. Le resultat le plus frappant est la 
grande proportion de temps consacree aux calculs 
dombrage. Cesl une forte indication du fait que plus de 
recherches pour ameliorer fefficacite de ces calculs et pour 
developper du materiel pour cet effet pourrail severer 
payant. 

MOTS CLES: systemes d'afTichage, techniques de rendu, 
profilage, ombrage, determination de visibility 

1. Motivations 

A graphic display system, in the context of this study, is a 
combination of hardware and software which extracts 
object descriptions from an application database, applies 
geometric transformations to create instances of objects, 
determines their projections in a two-dimensional screen 
space, and computes the colour value of each pixel for the 
frame buffer of a raster display device. We will limit our¬ 
selves to the consideration of systems which handle three- 
dimensional models of objects, and aim at a realistic pic¬ 
ture. Even with these restrictions, there exist systems 
which vary in performance from real-time to real-long- 
time (several hundred hours per frame), and from a few 
thousand dollars to a few million. 

A display system has three main components (note that we 
are not considering the interaction with the user in this 
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study). The first one is the modelling component, which is 
really part of the application. By modelling here we do 
not mean the designing and creation of the models, but 
their retrieval and/or generation on the fly. For example 
extracting polygons from the database, computing points 
on a parametric surface, generating stochastic data are all 
modelling operations. The second component involves 
geometric operations. I’ll is includes clipping, perspective 
transformations, mapping to the screen. Two other impor¬ 
tant parts of the geometric operations are visibility deter¬ 
minations and shading. They are classified within the 
geometric operations because they use directly the 
geometric properties of the objects and the scenes for 
their computations and none of the screen geometric pro¬ 
perties. And finally the third component includes all the 
display operations. In a raster system they are mainly the 
"scan-conversions", the sampling and filtering operations, 
and writing out the image (to a file or directly to the 
frame buffer). 

It is important to note that this subdivision is independant 
of the rendering scheme. For instance, consider a depth- 
buffer system and a ray-caster. They could have the same 
modelling primitives and operations, such as B-splinc sur¬ 
faces and adaptive subdivision; the geometric operations 
for the depth-buffer system are mainly as described above, 
and consist of ray intersection calculations and shading for 
the ray-caster; the display operations are scan conversions 
and depth comparisons for the depth-buffer, and distribut¬ 
ing into "scan buckets", subdividing the screen, etc., for 
the ray-caster. 

An indirect confirmation of the validity of this view is that 
when new algorithms or new hardware appear, they can 
easily be categorized following this scheme. 

New modelling techniques are appearing regularly 
[FoFC82, Reev83, Gard84, Gard85]. In geometry, the 
basic operations do not change very much, but the shading 
techniques became more sophisticated and more expensive 
[Cook81]. The visibility problem remains a active area of 
research, and even more effort is expended to make it 
more difficult [WhitSO]. In this respect ray-tracing and 
ray-casting are properly rendering methods, that is they 
involve the whole rendering scheme. Therefore they 
include the modelling, geometric and display operations. 
Recently the display side (notably the sampling and filter¬ 
ing operations) have received the most attention within 
that technique [Aman84, CoPC84, DiWo85, LeRU85]. 

Hardware developement, besides the wholesale implemen¬ 
tation in hardware of the graphics display system for real¬ 
time (light simulators [Scha83], has seen attacks on specific 
components: the purely geometric operations [Clar82] or 
the scan conversion component [FGI IS8S|. The design of 
specialized hardware for modelling, especially complex 
modelling, has been only started [PiFo84]. Notable by its 
absence is the lack of hardware design for shading. 

For most systems the goal is the greatest amount of real¬ 
ism for the least cost (in time and hardware to run it on). 
Given this, it is surprising that the literature is not more 
abundant on the performance evaluation of such systems. 


The information available so far, besides various raw tim¬ 
ings for pictures ("Figure X took 450 hours of Vax time") 
is limited to profiling results on one particular system 
[ReB185] and numbers and analysis of the performance of 
visibility determination algorithms [SuSS74]. Crow com¬ 
pared the times spent on modelling, geometric operations, 
shading and filtering [Crow81], which was mainly oriented 
towards a comparison of the latter operations. 

While most of the work in performance analysis bore on 
visibility determination, there was mounting evidence that 
the cost of modelling, and even more shading was rapidly 
getting larger. Already Crow pointed out that trend in 
[CrowSl]. The result of that is that we have to consider 
carefully the illumination models and the shading 
methods, especially as they relate to the visibility algo¬ 
rithms and the display operations. A fast visibility algo¬ 
rithm will degrade in performance if the depth complexity 
increases and it continues computing the shade for many 
invisible areas. At this point, an algorithm that computes 
the shading only for the visible surfaces might win, even if 
the visibility determination is less efficient 

The first task in comparing various systems is choosing the 
scene they will be run on. Here again it is a fairly complex 
problem, with not as many published results as its impor¬ 
tance and interest require. Kaplan and Greenberg 
[KaGr79] and Parke [Park80] addressed the problem for 
the analysis of depth buffer algorithms in conjunction 
with various processor architectures. Schmitt [Schm81] did 
the same, but this time to determine empirically the com¬ 
plexity of various visibility algorithms. More recently 
Whelan [Whel85] considered llie problem again within the 
context of multiprocessor architectures. 

Of course the problem of choosing the right test data is 
not unique to graphics. The problem here is twofold. One 
problem is to determine how to measure scene characteris¬ 
tics, and the other is to decide what are the characteristics 
of "typical" scenes. 

2. Methodology 

For this first report we tried to keep the number of vari¬ 
ables under control, but to have enough variety and 
relevance to be of use to practitioners. The tactic we have 
adopted is to have three different Tenderers displaying the 
same scenes on the same hardware. The difference 
between the Tenderers is mainly in their methods to deter¬ 
mine visibility. 

The first Tenderer is a ray-caster, which we will call RC 1 . 
To speed up ray intersection, it subdivides screen space 
into buckets, and each polygon is added to a bucket list if 
its bounding box intersects the bucket. For each ray only 
the polygons listed with the bucket intercsected by the ray 
are examined. It has an antialiasing mode, where pixels 
are adaptively subdivided if the shades at each comer 
differ by mo re than a given threshold. It can adaptively 

I. i* lc ray-castcr is based on software originally written by Mike 
Sweeney at the University of Waterloo Computer Graphics La¬ 
boratory. An improved version is now a component of the Alias 1 
rendering module. 
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subdivide parametric surfaces that way, but this was not 
used here to allow easier comparisons. 

The other iwo renderers share the same front end, known 
as 3d [AmBr86] at the University of Toronto Dynamic 
Graphics Project. The second Tenderer, which we will 
designate as DB, uses a file depth-buffer to determine visi¬ 
bility. It does not have an anti-aliasing option. 

The third Tenderer uses Watkins algorithm [Watk70] to 
compute a scanline per scanline visibility. We will call it 
WS. It has an antialiasing option which uses the full preci¬ 
sion in the X direction, and four subscanlines in the Y 
direction. 

They both use a variety of rendering options, with a 
choice ol illumination model, and include texture map¬ 
ping, except for DB. 

2.1. Modelling Primitives 

Since the systems have to render the same scenes, they 
will have to use the same modelling primitives. They are 
polygons and B-splines patches, the most prevalent in 
current practice. The scene description actually is defined 
in a scene description language , and various filters generate 
the files for each renderers. 

2.2. Geometric Primitives 

Even though it was not mandatory for this study, the three 
systems all use triangles internally as geometric primitives. 



PI 

P2 

P3 

Modelling 

Polygons 

244 

244 

244 

Geometric 

Triangles 

488 

488 

488 

Kffcctivc 

Triangles 

485 

485 

482 

Average Depth 

0.60 

2.32 

8.32 

Average Pixels 
Covered 

321 

1253 

4526 

Average Area 

326 

1313 

5376 


Table 1. Scene characteristics for polygons 


2.3. Display Primitives 

The three renderers produce a raster image, and were all 
set to output the image to a run-length encoded file. They 
therefore have basically the same output method. They 
were set to output a 512x512 image of 8 bit each of red, 
green and blue pixels. 

3. Scene C haracteristics 

In order to isolate only a few variable, we decided to keep 
the number of modelling primitives constant for each 
series of scenes. In the first series, we distributed 40 cubes 
(6 faces each) roughly uniformly over the window. The 
spacing was chosen so that there was little overlap 


between cubes. Then for the subsequent scenes the cubes 
were linearly doubled around their centres so that the 
depth complexity, the average area of the polygons and 
the number of covered pixels all increased regularly. 

In the second series, we designed a "glass” made of a 6 by 
6 array of B-spline patches, and made three copies of it. 
There are therefore 3 primitives if primitives arc control 
point networks, but 108 primitives if each patch is con¬ 
sidered a primitive. The level of subdivision was set at 2, 
4, 8 and 16 segments to a side. In this series the depth 
complexity and the total number of pixel covered is practi¬ 
cally constant. The number of geometric primitives 
increases and the average si/e of each decreases to keep 
the product almost constant. Table 1 and 2 gives the main 
numbers for each series. Figures 1 to 3 and 4 to 6 are line 
drawings of the first six scenes. 

The statistics given here were chosen to indicate the com¬ 
plexity of the scene. The depth complexity is the average 
number of object in one pixel, and will allow to gauge the 
efficiency of visibility determination and of shading. The 
average area of the polygons, computed analytically from 
the screen coordinates of the vertices, will help in deter¬ 
mining the "polygon set-up time" vs the cost of pixel cal¬ 
culations. A pixel is deemed covered by a polygon if its 
center is inside the polygon. For the scenes used here the 
last two numbers are almost equal, but as the polygons 
become thinner, the difference can become important. 
Other statistics which are not included here can also be 
important. The number of edges, and the number of pixels 
containing edges is an example. In further studies about 
the role of filtering and antialiasing, we will have to con¬ 
sider them, as well as distinguish between silhouette edges 
and internal edges. If the scenes are used to test parallel 
algorithms, the distribution of the primitives in space, and 
their aspect ratio should be taken into account 



VI 

V2 

V3 

V4 

Modelling 

Patches 

108 

108 

108 

108 

Geometric 

Triangles 

864 

3744 

15552 

63360 

Inflective 

Triangles 

864 

3744 

15552 

63358 

Average Depth 

0.61 

0.59 

0.59 

0.59 

Average Pixels 
Covered 

185 

41 

9 

2 

Average Area 

185 

41 

10 

2.4 


Table 2. Scene characteristics for B-splines 


The scenes were all lit by three local light sources. The 
illumination model used was the same across systems, 
being the Lambert cosine law for the polygons, and Phong 
illumination model for the patches, with a shinyness of 51). 
The back facing polygons were not culled, and every 
polygon was uniformly shaded (no Gouraud shading). 
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Figure 4. Scene VI 


Figure 1. Scene PI 



Figure 2. Scene P2 




Figure 3. Scene P3 

Figure 6. Scene V3 
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4. Hardware and operating system characteristics 

All three systems were run on a Celerity Cl200 with 4MB 
of main memory and three 120 MB disks 2 . The processor 
also has a 64K cache. The operating system was Unixf 
version 4.2 BSD. The three systems were compiled with 
the standard C compiler, using the profiling, optimization 
and hardware floating point options. The only difference 
was that RC had to use the option which prevents single 
floats to be cast into double, and the other two could not 
be run with that option. It is one more reason to be cau¬ 
tious about any comparison of the absolute times. 

5. Results 

Each of the seven scenes were displayed 5 times (RC, RC 
antialiased, DB, WS, WS antialiased). The last two scenes 
did not run with WS, because the allocated memory was 
not big enough. In the interest of brevity, we will only 
give two tables and three plots. 

The time directly spent on I/O was not included in the 
tables, but is fairly constant for each system across scenes. 
The first thing to note is that modelling plays no role in 
the first series of scenes, and the geometric transformations 
play little role. The dominant factors are shading and visi¬ 
bility determinations. In fact shading takes from 20 to 
more than 90% of the total time. The plots of Figures 7 
and 8 show the times for visibility determination and shad¬ 
ing for all five renderings as a function of the average 
polygon coverage. The plot of Figure 7 shows that the 
cost of visibility determination continues to climb briskly 
for the ray-caster even in the 5000 polygon range. It is 
clear from the plot of Figure 8 that DB pays the price for 
shading many non-visible areas as the depth complexity 
increases. Since all the other systems tend to flatten out as 
the depth complexity increases, the depth-buffer is the 
worse from the middle of the range explored here. 

Figure 9 gives (lie plot of the times for visibility determi¬ 
nation and shading for RC, RCa and DB. The main 
features of the statistics for the series of patch scenes are 
that while shading is still an important factor, the visibility 
determination becomes more important for the non ray¬ 
casting systems as the polygons get smaller. In fact, as 
expected, the RC and RCa are relatively insensitive to the 
size of the polygons, especially for the shading. The 
growth of the cost of visibility determination is less than 
could have been expected. In fact the ray-caster is a 
winner from around 5000 triangles (remember the warning 
against taking these absolute numbers too seriously). For 
the first time the cost of modelling begins to be felt, in 
particular for RC in scene V4. But it should be stressed 
that we are using fairly simple primitives here. It should 
also be kept in mind though that sooner or later the 
storage requirements will hinder RC, and they prevented 
us to run WS and WSa on the last two scenes. 


2. All the profiling w;is done mi one disk ')()% lull, 
t Unix is a trademark of AT&T Hell. Laboratories 


6. Conclusions 

These results are only a small sample of the profiling done 
so lar. We tried here to define two series of scenes and 
choose three tenderers so that the number of independent 
variables was relatively small. Most numbers confirmed 
our prejudices. Renderers using depth buffet's do poorly 
when the depth complexity irtcreases (here it had prob¬ 
lems above 2) and ray-casters do well when the polygons 
become small. They confirmed that shading is a large pait 
of the cost of the rendering, and it is therefore important 
to help with efficient routines and specialized hardware. It 
is important to note that DB and WS spend 10% or more 
of their time doing vector normalization. The computer 
used has hardware square root, so it was not as much a 
factor as it usually is in that type of programs. 

The data for antialiasing is also mainly indicative. For 
both RC and WS antialiasing about doubles the cost and 
the increase comes mainly from more shading computa¬ 
tions. We did not study here the relative cost of different 
illumination models and shading techniques, hut we will 
do so as part of this study. Considering Lhe high cost of 
shading, it has a significant impact on the total cost. 

This brings up the issue of the quality of the picture. Of 
course most of these costs are assumed in the belief that a 
better picture will ensue. Our test pictures were as identi¬ 
cal as we could expect, and therefore are not much help in 
tins respect. We plan to complete a similar study with 
complex objects that have been modelled for other pur¬ 
poses with shophislicated shading and up to several hun¬ 
dred thousand polygons. Then the picture quality, 
especially as it relates to antialiasing will have to be 
judged subjectively. 
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limes 

RC ‘ 

PI 


Total 

152 H 

Geometry 

6 

Visibility 

81 

Shading 

50 



P2 

Total 

Geometry 

Visibility 

Shading 


P3 

Total 

Geometry 

Visibility 

Shading 



RCa 

% 

DB 

% 

WS 

% ' 

WSa 

% 

409 
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100 

120 

foo 
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100 

6 

1 

2 

1 

2 

1 

2 

0 

292 

71 

5 

2 

11 

9 

34 

14 

96 

23 
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92 

103 

85 

194 

82 

624 

100 
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100 

292 

.100 
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100 

12 

1 

2 

.0 

2 

0 

2 

0 

409 

65 

8 

1 

21 

7 

73 

14 
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30 
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97 
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90 

418 

84 
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100 

964 

100 

365 

100 

645 

100 

33 

4 

2 

0 

2 

0 

2 

0 

568 

71 

8 

0 

43 

11 

161 

24 

178 

22 

950 

98 

316 

86 

477 

73 



84 (Figure 7. Visibility determination vs polygon coverage 


Table 3. Some statistics for the polygon scenes 


Times 

RC 

% 

VI 

Total 

156 

100 

Modelling 

1 

0 

Geometry 

6 

3 

Visibility 

79 

50 

Shading 

63 

40 


168 

100 

Modelling 

4 

2 

Geometry 

10 

5 

Visibility 

85 

50 

Shading 

62 

36 

V3 

Total 

213 

100 

Modelling 

13 

6 

Geometry 

33 

15 

Visibility 

98 

46 

Shading 

62 

29 


RCa 

% 

307 

100 

1 

0 

6 

1 


1 0 

0.2 0 

12 6 

62 92 




WS 

% 

WSa 

% 

166 

100 

338 

100 

1 

0 

1 

0 

0.2 

0 

0.2 

0 

36 

21 

102 

30 

129 

77 

235 

69 

240 

100 

538 

100 

2 

0 

2 

0 

0.2 

0 

0.2 

0 

103 

42 

249 

46 

134 

55 

286 

53 



Figure 8. Shading vs polygon coverage 
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1 able 4. Some statistics for the patch scenes 



Figure 9. Visibility and shading vs subdivision 
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USING CACHING AND BREADTH-FIRST SEARCH TO SPEED UP 

RAY-TRACING 

(extended abstract) 

Pat Hanrahan 


Abstract 

Ray-tracing is an expensive image synthesis 
technique because many more ray-surface 
intersection calculations are done than are 
necessary to shade the visible areas of the 
image. This paper extends the concept of 
beam-tracing so that it can be coupled with 
caching to reduce the number of intersection 
tests. Two major improvements are made over 
existing techniques. First, the cache is organized 
so that cache misses are only generated when 
another surface is intersected, and second, the 
search takes place in breadth-first order so that 
coherent regions are completely computed before 
moving onto the next region. 


Introduction 

Ray-tracing has attracted considerable 
attention recently because of the super-realistic 
images that can be produced. Lighting and shading 
effects that require information about the global 
environment, such as shadows, reflections and 
refractions, can be calculated by recursively 
tracing rays from the surfaces they intersect 
[Whitted, 1980]. Distributed or stochastic 
ray-tracing can be used to simulate other optical 
effects, such as motion blur, finite-sized light 
sources, prismatic effects, etc., and to remove 
many of the artifacts due to point sampling the 
image [Cook, Porter and Carpenter, 1984]. 
Ray-casting can also be used to generate line 
drawings and sectioned views, and to perform 
the volume integrals needed for the calculation 
of mass properties [Roth, 1982]. Another 
advantage of ray-tracing is that it is 
conceptually elegant and easy to implement. The 
models that comprise the scene can be rendered 
if a procedure to intersect a ray with their 
surfaces is provided. Because of the 
object-oriented architecture, a ray-tracing 
system is easy to maintain and extend. The 
number of geometric primitives that can be 
ray-traced is quite large and continues to grow. 

The major disadvantage of the standard 
ray-tracing algorithm is that the time needed to 
generate an image is equal to the number of 
geometric primitives times the size of the 
output image. This is because when an individual 
ray is being traced all the objects in the scene 
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need to be tested to check for an intersection. As 
a result there are many more ray-surface 
intersection calculations performed than there 
are rays intersecting visible surfaces. Several 
approaches have been attempted to reduce the 
number of intersection tests. Rubin and Whitted 
[1980] use a hierarchical tree of bounding boxes 
to describe a scene. Since the bounding volumes 
of the children lie entirely within the bounding 
volumes of the parent, the child volumes need 
only be searched if the ray intersects the parent 
volume. An alternative approach is to decompose 
space into a set of disjoint volumes. Each volume 
in the subdivision contains a list of those 
surfaces contained within it, and the subdivision 
is made fine enough so that the total number of 
objects in each volume is a small number. The 
search for an object intersection proceeds along 
the path of the ray through the subdivision. Two 
different subdivision methods have been 
reported. One method decomposes space into a 
rectangular array of voxels. In this case the 
volumes along the path of the ray can be 
determined using a 3-d incremental line drawing 
algorithm [Fujimoto and Iwata, 1985; Haeberli, 
1985]. The second method decomposes space with 
an oct-tree [Glassner, 1984; Kaplan, 1985]. In 
this case, determining the next volume in the 
path is more complicated, but this disadvantage 
is offset by the fact that the oct-tree 
decomposition takes less space for a given level 
of detail. 

In this paper we develop another method for 
speeding up the search for ray intersections by 
combining two methods previously reported in 
the literature: beam-tracing [Heckbert and 
Hanrahan, 1984] and coherent ray-tracing [Speer, 
DeRose and Barsky, 1985]. We discuss how the 
concept of a beam-tree can be used to 
characterize the coherence contained in an 
image. The beam-tree also suggests that the 
ray-surface intersections should be searched for 
in breadth-first order, that is, all the 
intersections with a given surface should be 
found before proceeding to the next surface. We 
also discuss several new methods for caching 
rays. 
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The Beam Tree 

Heckbert and Hanrahan [1984] described a 
method to trace beams through a scene 
consisting of polygonal objects. This method was 
based on the observation that neighboring rays 
have essentially the same object intersection 
tree. This coherence can be quantified by 
introducing the notion of the beam-tree. In a 
ray-tree (as described in Whitted [1980]), the 
links represent rays of light and the nodes 
represent the surfaces that those rays 
intersected. Similarly, the links in a beam-tree 
represent beams of light, and the nodes contain a 
list of surfaces intersected by a beam. Each of 
the surfaces intersected by the beam spawns 
new beams corresponding to reflections, 
refractions and shadows. 

In Heckbert and Hanrahan [1984] the beams 
of light were pyramidal cones. The original beam 
was the viewing pyramid and since the objects 
in the scene were polygons, all the secondary 
beams also had polygonal cross-sections. In the 
case of reflection and shadows, and under certain 
assumptions, refraction, it was shown that the 
new beams were also pyramidal cones ~ that is, 
they contained a single apex. These restrictions 
allowed an entire beam to be traced at once by 
using a recursive polygonal hidden surface 
algorithm similar to that described in Weiler and 
Atherton [1977]. 

In case of curved surfaces or of true 
refraction, the form of a beam changes 
drastically when it interacts with a surface. 
Therefore, it is difficult to devise an algorithm 
to trace all the rays contained within it 
simultaneously. However, as we will 
demonstrate, it is still possible to take 
advantage of the coherence of a beam. In the 
general case we define a beam as a set of rays 
that all originate from the same object (or from 
the same point) and all intersect the same 
object. Generally, rays grouped together into 
beams will belong to adjacent pixels, although 
this is not strictly required. For example, all the 
rays through the eye that hit the background may 
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be considered a single beam even though the 
regions comprising the background may not be 
connected. A beam under this definition need not 
be uniform, for example, it might contain 
caustics and other singularities. An example of a 
coherent beam that contains a singularity is one 
which passes through a lense or into a crystal 
ball. 

Notice that the beam tree for a given image 
(scene plus point of view) is independent of the 
method used to generate it. Given the ray-trees 
for all the pixels in the scene, the beam-tree 
could be created as a post-process by recursively 
merging adjacent rays if they intersect the same 
surface. The size of the beam-tree is a natural 
measure of the intrinsic coherence in an image. 


coherence = average rav-tree size 

total beam-tree size 

Where the average ray-tree size is the total 
number of nodes in the ray-tree divided by the 
number of pixels. If at each level of the tree all 
the rays could be coalesced into a single beam 
then the size of the beam-tree would be the same 
as the average ray-tree size. This would be the 
maximum coherence possible. If the beam-tree is 
any bigger, this implies that adjacent ray-trees 
could not be merged, resulting in a new 
sub-beam, and therefore, less coherence. Notice 
that using this definition, the coherence does not 
depend on the relative sizes of the different 
sub-beams. For example, an image with one large 
beam and two small beams has the same 
coherence as an image with three equal-sized 
beams. 

The amount of work needed to compute an 
image is the sum of two factors: shading and 
intersection processing. The number of 
calculations to shade the image is proportional 
to the size of the image times the average size 
of the ray tree. The optimal number of 
calculations needed to compute the ray-tree at 
each pixel is proportional to the size of the 
beam-tree. Optimistically, the cost of 
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computing each node in the beam-tree would be 
proportional to the number of objects in the 
scene. Thus, coherence allows us to decouple the 
complexity of the intersection phase of the 
calculation (which is most sensitive to the 
•number of objects in the scene) from the shading 
phase of the calculation, (which is most 
sensitive to resolution of the image). In 
particular, notice that in standard ray-tracing 
the cost per pixel is multiplied by the number of 
objects, whereas using beam tracing this cost is 
amortized over the average number of pixels per 
beam. Thus, beam-tracing wins big at high 
resolutions. 


Caching 

Speer, DeRose and Barsky [1985] described a 
method to speed up ray-tracing which they term 
coherent ray-tracing. Their method is based on 
the same observation as contained in Heckbert 
and Hanrahan [1984]: that adjacent rays have a 
high probability of intersecting the same 
objects. However, instead of attempting to trace 
many rays simultaneously, they save the ray-tree 
corresponding to the previous ray and use it to 
guide the next intersection test. The ray-tree is 
intended to act as a cache. A cache hit occurs if 
the next ray intersects the same surface; a miss 
occurs if another surface is intersected. The 
cache is complicated by the fact that, although 
the same object may be hit by the next ray, 
another object may block the ray before it hits 
that object. They solved this problem by using a 
cache with two types of information: the last 
object intersected and a cylindrical region of 
safety. The cylinder of safety is the largest 
region surrounding the ray which does not 
contain any other surfaces. When a cache hit 
occurs, the ray is only tested against the last 
sphere and the cylinder. Thus, since only two 
tests need to be done, if there is a cache hit the 
average cost of computing an intersection per 
ray is constant within a beam. 
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We implemented this technique and upon 
examining caching statistics found that there 
were many cache misses even though the same 
sphere was still intersected. This is because the 
cylindrical region of safety is much smaller than 
the beam cross-section (see Figure 1). To 
remedy this, we devised a method which may in 
some situations require more work, but will 
cause a cache miss only if the last sphere was 
not intersected. 

Figure 2 shows a situation where a ray hits 
one sphere and then hits a second sphere. The 
cache contains the last sphere hit and a list of 
spheres that could potentially block a ray 
travelling from the first to the second. Normally 
this list is empty or contains only a small 
number of spheres. Any ray originating on the 
surface of the first sphere that also intersects 
the second sphere can only intersect objects 
contained on this list of potential blocking 
spheres. A cache miss occurs if, first, the ray 
does not intersect the second sphere, or second, 
it intersects a sphere contained in the list of 
blocking spheres. Using this caching system all 
misses imply that a new object has been 
intersected. Another advantage of this technique 
is that no special ray-cylinder intersection tests 
are required. 


There are several ways in which the list of 
potential blocking spheres is generated. The most 
common situation is where the ray originated 
from the eye point or is travelling to the light 
source. In this case the .list of spheres are all 
those spheres contained in a cone from the point 
to the sphere that was intersected. The second 
most common situation is where a ray travels 
between two spheres. In this case all the spheres 
that lie within a cone circumscribed around the 
two spheres and between them are determined. 
Another method used to generate a list of 
potential blocking spheres is when a ray enters 
the interior of a transparent sphere and 
intersects itself. In this case the list of 
blocking spheres are all those spheres that 
intersect the interior of the transparent sphere. 
It should be mentioned that it is possible to 
precompute the list of potential blocking spheres 
for a given scene before an image is generated. 
However, the naive algorithm to do this is of 
0(n 3 ). 

This new method works well for rays that 
travel from sphere to sphere, from point to 
sphere, or from sphere to point, but does not 
work if a ray doesn't intersect any objects. In 
this case the original method due to Speer, 
DeRose and Barsky should be used. 



Figure 1 - This figure shows two spheres (in 
bold) and a cylinder of safety viewed along the 
axis of the ray (marked with a +). Notice that 
the cylinder is much smaller than the visible 
part of the large sphere. 


Figure 2 - This figure shows a ray reflecting off 
the large sphere on the left and hitting the 
smaller sphere on the right. Around these 
spheres is a cone which contains a single sphere 
which might potentially block another ray 
travelling between the same two spheres. 
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breadth-first Search 

Caching will work effectively if the 
searches are ordered in a way which maximizes 
the probability that a cache hit will occur. For 
example, if rays were randomly chosen from 
different pixel locations then we would expect 
few cache hits. Thus, the effectiveness of 
caching depends both on the intrinsic coherence 
in the scene and the search strategy employed. 
Fortunately, we are at liberty to reorder the 
search. This is analogous to the situation 
encounted in optimizing compilers where 
instruction execution order is rearranged to 
maximize the number of cache memory hits. The 
goal is to use knowledge about the general 
properties of the beam-tree so that searches for 
ray-surface intersections can be ordered in a 
way that maximizes the probability of cache 
hits. 

The first important point is that the cache 
should be organized as a tree. There is little 
reason to suspect that a reflected ray will hit 
the same object as the refracted ray or the 
incident ray. Practically this means that if a 
cache miss occurs at a parent node then we 
should flush the cache of all the child nodes. 

The second and major point is that the 
beam's cross-section is two-dimensional, not 
one-dimensional. Consider the simplified case 
where the ray-tracer is only being used to 
remove hidden surfaces, so that there are no 
reflected or transmitted rays. In the standard 
ray-tracer, rays are generated in scanline order. 
The number of cache misses per scanline is equal 
to the number of regions crossing that scanline. 
Each cache miss causes all the objects to be 
searched. The total number of complete searches 
is therefore much greater than the total number 
of regions. In order to achieve one complete 
search per region the search should continual 
two-dimensionally until a cache miss occurs. 
This is similar to the common seed fill or 
boundary fill algorithm used in paint systems 
[Smith, 1979]. This search method also works 
when the tree has greater depth. If we imagine 
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the complete ray-tree as including all the rays 
emanating from the eye point, the region fill 
corresponds to a breadth-first search of this 
tree. 

The initial reaction to breadth-first search 
is that since images are so large, the size of the 
list of rays queued would be prohibitively large. 

. However, it is possible to organize the search in 
scanline order so that the list is kept to a 
reasonable size. 

Results 

To test these ideas we implemented a 
simple cached ray-tracer for spheres. The code 
was written so it was easy to turn caching on or 
off. This program was run over a variety of 
different scenes with similar results. In the 
table below the scene consisted of a NxNxN 
array of spheres whose centers and radii were 
randomly jittered. This cube of*spheres was then 
viewed from an angle and scaled so that it filled 
the screen. As can be seen from Table 1, caching 
itself sped up the ray-tracer from 2-5 times; 
adding breadth-first search sped it up by another 
factor of 2-3. 


Number of spheres 

3 3 

4 3 

5 3 

Not cached 

1.00 

1.00 

1.00 

Cached 

.53 

.29 

.20 

Cached, breadth-first .25 

.15 

.11 

Table 1 

- Timing results 



Conclusions 


Although these results are preliminary, this 
area of research look promising. Analyzing the 
caching statistics and comparing them to the 
actual coherence as measured by the beam-tree 
we find that the number of complete searches is 
still much more than the theorectical maximum. 
In particular the cache hit test is not very 
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successful when a ray had not previously hit an 
object. Improving this case would significantly 
speed up the program. 

Spheres were chosen in this study because 
the intersection tests are easy to implement and 
because spheres can also be used to bound the 
extent of other object types. An avenue for 
further research is to devise cache tests for 
other object types. In the general case it may be 
worthwhile to allow different caching strategies 
for different objects. For example, the polygons 
in a convex polyhedral solid cannot occlude other 
polygons of that solid. Thus, these polygons 
cannot be on the list of potential blockers. 
Sometimes even in the case when a search 
through all the objects in the scene need be done, 
a cache can be used to speed this up. Considering 
again the case of a convex polyhedral solid, we 
can cache the last polygon hit. If that polygon is 
missed by the next ray, then we should search 
polygons adjacent to it first. 

Caching is a very general method for 
speeding up computations when coherence exists. 
For this reason it can be used along with other 
methods, such as cellular decomposition, to 
speed up the search for ray-surface 
intersections. It is also likely that many other 
hidden surface algorithms would benefit from 
caching. 

Finally, the idea of breadth-first search was 
originally motivated by the desire to build an 
interactive ray-tracing tool. If each ray is 
immediately painted onto the image after it is 
traced, then the details of the image will 
gradually be filled in. First, the a hidden surface 
view will be drawn, followed by reflections and 
shadows at greater and greater depth. 
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ABSTRACT 

There has been a great interest recently in systems 
that use graphics to aid in the programming, debug¬ 
ging, and understanding of computer programs. The 
terms "Visual Programming” and "Program Visuali¬ 
zation” have been applied to these systems. Also, 
there has been a renewed interest in using examples 
to help alleviate the complexity of programming. 
This technique is called "Programming by Example.” 
This paper attempts to provide more meaning to these 
terms by giving precise definitions, and then uses 
these definitions to classify existing systems into a 
taxonomy. 


RESUME 

Les systemes qui utilisent l’infographie pour aider k 
la programmation, k la mise-au-point et k la 
comprehension de logiciels ont rScemment suscit6 
beaucoup d’int§ret. Les termes "programmation 
visuelle” et "visualisation de programmes” ont ktk 
associfcs a ces systemes. II y a aussi eu un renouveau 
d’intSret pour Tutilisation d’exemples pour aider k 
simplifier la programmation. On parle alors de "pro¬ 
grammation par exemples”. Nous essaierons de dSfinir 
ces termes avec plus de precision et utiliserons ces 
definitions comme base pour etablir une taxonomie 
des systemes disponibles actuellement. 


Key Words and Phrases: Visual Programming, Pro¬ 
gram Visualization, Programming by Example, 
Inferencing, Automatic Programming, Flowcharts, 
Debugging Aids, Program Synthesis, Documentation, 
Computer Languages. 
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Extended Summary. 

NOTE: This paper is a summary of [Myers 86]. 
The reader should refer to that paper for full informa¬ 
tion. 

As the distribution of personal computers and the 
more powerful personal workstations grows, the 
majority of computer users now do not know how to 
program. They buy computers with packaged 
software and are not able to modify the software even 
to make small changes. In order to allow the end user 
to reconfigure and modify the system, the software 
may provide various options, but these often make the 
system more complex and still may not address the 
users' problems. "Easy-to-use” software, such as the 
"Direct Manipulation” systems [Shneiderman 83] 
actually make the user-programmer gap worse since 
more people will be able to use the software (since it 
is easy to use), but the internal program code is now 
much more complicated (due to the extra code to han¬ 
dle the user interface). Therefore, systems are mov¬ 
ing in the direction of providing end user program¬ 
ming. It is well-known that conventional program¬ 
ming languages are difficult to learn and use [Gould 
84], requiring skills that many people do not have. In 
an attempt to make the programming task easier, 
recent research has been directed towards using 
graphics. This has been called "Visual Programming” 
or "Graphical Programming”. Some Visual Program¬ 
ming systems have successfully demonstrated that 
non-programmers can create fairly complex programs 
with little training [Halbert 84]. 

Another motivation for using graphics is that it 
tends to be a higher-level description of the desired 
actions (often de-emphasizing issues of syntax and 
providing a higher level of abstraction) and may 
therefore make the programming task easier even for 
professional programmers. This may be especially 
true during debugging, where graphics can be used to 
present much more information about the program 
state (such as current variables and data structures) 
than is possible with purely textual displays. This is 
one of the goals of Program Visualization. Other Pro¬ 
gram Visualization systems use graphics to help teach 
computer programming. 
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Programming-by-Example is another technology 
that has been investigated to make programming 
easier, especially for noh-programmers. It involves 
presenting to the computer examples of the data that 
the program is supposed to process and using these 
examples during the development of the program. 
Many, although not all, Programming-by-Example 
systems have also used Visual Programming, so these 
two technologies are often linked. 

Recently, there has been a large number of arti¬ 
cles about systems that incorporate some or all of 
these features [Grafton 85][Raeder 85]. Unfor¬ 
tunately, the terms have been used imprecisely 1 , and 
there has not been a comprehensive taxonomy that 
classifies these systems. This paper summarizes 
research that attempts fill this gap in the literature. 
The full results are reported in [Myers 86]. First, the 
important terms are defined in a precise manner, and 
then these definitions are used to differentiate some 
example systems. 

There are many systems that could be included in 
this paper in the various categories, but no attempt 
has been made to be comprehensive. It is hoped that 
the selection of systems listed will help the reader 
understand the intent of the classification system. 

Definitions. 

Programming : What is meant by computer "program¬ 
ming” is probably well understood, but it is important 
to have a definition that can be used to eliminate 
some limited systems. In this paper, "program” is 
defined as "a set of statements that can be submitted 
as a unit to some computer system and used to direct 
the behavior of that system” [Oxford 83]. While the 
ability to compute "everything” is not required, the 
system must include the ability to handle conditionals 
and iteration, at least implicitly. 

I nteractive vs. Batch Any programming language 
system may either be "interactive” or "batch.” A 
batch system has a large processing delay before 
statements can be run while they are compiled, 
whereas an interactive system allows statements to be 
executed when they are entered. This characteriza¬ 
tion is actually more of a continuum than a dichotomy 
since even interactive languages like LISP typically 
require groups of statements (such as an entire pro¬ 
cedure) to be specified before they are executed. 


^or example, Zloofs Query-By-Example system [Zloof 77 and 
81] is not a Programming by Example system. 


Visual Programming "Visual Programming” (VP) 
refers to any system that allows the user to specify a 
program in a two (or more) dimensional fashion. Con¬ 
ventional textual languages are not considered two 
dimensional since the compiler or interpreter 
processes it as a long, one-dimensional stream. 
Visual Programming includes conventional flow 
charts and graphical programming languages. It does 
not include systems that use conventional (linear) pro¬ 
gramming languages to define pictures. This elim¬ 
inates most graphics editors, like Sketchpad [Suther¬ 
land 63]. 

Program Visualization ' "Program Visualization” (PV) 
is an entirely different concept from Visual Program¬ 
ming. In Visual Programming, the graphics is the 
program itself, but in Program Visualization, the pro¬ 
gram is specified in the conventional, textual manner, 
and the graphics is used to illustrate some aspect of 
the program or its run-time execution. Unfortunately, 
in the past, many Program Visualization system have 
been incorrectly labeled "Visual Programming" (as in 
[Grafton 85]). Program Visualization systems can be 
divided along two axes: whether they illustrate the 
code or the data of the program, and whether they are 
dynamic or static. "Dynamic” refers to systems that 
can show an animation of the program running, 
whereas "static” systems are limited to snapshots of 
the program at certain points. If a program created 
using Visual Programming is to be displayed or 
debugged, clearly this should be done in a graphical 
manner, but this would not be considered Program 
Visualization. Although these two terms are similar 
and confusing, they have been widely used in the 
literature, so it was felt appropriate to continue to use 
the common terms. 

Programming by Example The term "Programming 
by Example” (PBE) has been used to describe a large 
variety of systems. Some early systems attempted to 
create an entire program from a set of input-output 
pairs. Other systems require the user to "work 
through” an algorithm on a number of examples and 
then the system tries to infer the general program 
structure. This is often called "automatic program¬ 
ming” and has generally been an area of Artificial 
Intelligence research. 

Recently, there have been a number of systems 
that require the user to specify everything about the 
program (there is no inference involved), but the user 
can work out the program on a specific example. The 
system executes the user's commands normally, but 
remembers them for later re-use. Bill Buxton coined 
the phrase "Programming with Examples” to more 
accurately describe these systems. Halbert [84] 
characterizes Programming with Examples as "Do 
What I Did” whereas inferential Programming by 
Example might be "Do What I Mean”. The term 
"Programming by Example” will be used to include 
both inferencing systems and Programming With 
Example systems. 
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Of course, whenever code is executed in any sys¬ 
tem, test data must be entered to run it on. The dis¬ 
tinction between normal testing and "Programming 
with Examples” is that in the latter the system 
requires or encourages the specification of the exam¬ 
ples before programming begins, and then applies the 
program as it develops to the examples. This essen¬ 
tially requires all Programming- with-Example sys¬ 
tems (but not Programming-by-Example systems with 
inferencing) to be interactive. 

Taxonomy of Programming Systems. 

This paper presents two taxonomies. The first is 
for systems that support programming. The second 
taxonomy is for systems that use graphics after the 
programming process is finished (Program Visualiza¬ 
tion systems). 

A meaningful taxonomy can be created by classi¬ 
fying programming systems into eight categories 
using the orthogonal criteria of 
• Visual Programming or not, 

® Programming by Example or not, and 
° Interactive or batch. 

Of course, a single system may have features that fit 
into various categories and some systems may be hard 
to classify, so this paper attempts to characterize the 
systems by their most prominent features. Figure 1 
shows the division with some sample systems. 


Taxonomy of Program Visualization Systems. 

The systems listed below are not programming 
systems since code is created in the conventional 
manner. Graphics in these are used to illustrate some 
aspect of the program after it is written. Figure 2 
shows some Program Visualization systems classified 
by whether they attempt to illustrate the code or the 
data of a program (some provide both), and whether 
the displays are static or dynamic. 

Conclusions. 

Visual Programming, Programming by Example 
and Program Visualization are all exciting areas of 
active computer science research, and they promise to 
improve the user interface to programming environ¬ 
ments. A number of interesting systems have been 
created in each area, and there are some that cross 
the boundaries. This paper has attempted to classify 
some of these systems in hopes that this will clarify 
the use of the terms and provide a context for future 
research. 
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Batch 

Interactive 

Not VP 

All Conventional 
Languages: 

Pascal, Fortran, 
etc. 

LISP, APL, etc. 

VP 

Grail 
[Ellis 69] 

AMBIT/G/L 
[Christensen 68,71] 
Query by Example 
[Zloof 77, 81] 

FORMAL 
[Shu 85] 

GAL 

[Albizuri-Romero 84] 

Graphical Program Editor 
[Sutherland 66] 

PIGS 
[Pong 83] 

Piet 

[Glinert 84] 

PROGRAPH 
[Pietrzykowski 83,84] 
State Transition UIMS 
[Jacob 85] 
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Batch 

Interactive 

Not VP 

I/O pairs* 

[Shaw 75] 

Tinker 

[Lieberman 82] 

VP 

[Bauer 78] traces* 

AutoProgrammer* 
[Biermann 76] 

Pygmalion 
[Smith 77] 

Graphical Thinglab 
[Boming 86] 

SmallStar 
[Halbert 81,84] 

Rehearsal World 
[Gould 84] 


Figure 1. 

Classification of programming systems by whether they 
are visual or not, whether they have Programming by Ex¬ 
ample or not, and whether they are interactive or batch. 
Starred systems (*) have inferencing, and non-starred PBE 
systems use Programming .With Example. 


Static__ Dynamic 


Code 

Flowcharts 
[Haibt 59] 

SEE Visual Compiler 
[Baecker 86] 

PegaSys 
[Moriconi 85] 

BALSA 
[Brown 84] 

PV Prototype 
[Brown 85] 


TX2 Display Files 

Two Systems 
[Baecker 75] 
Sorting out Sorting 

Data 

[Baecker 68] 

[Baecker 81] 


Incense 

BALSA 


[Myers 80,83] 

[Brown 84] 


Animation Kit 
[London 85] 

PV Prototype 
[Brown 85] 


Figure 2. 

Classification of Program Visualization Systems by wheth¬ 
er they illustrate code or data, and whether they are 
dynamic or static. 
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Abstract 

A basic architecture for a User Interface Management System is 
presented. The problem of updating a display in response to 
interactive commands is discussed. The basic architecture is 
then extended to include basic editing and browsing processes 
on arbitrary data structures. Editing templates are presented as a 
technique which embodies the entire manipulation process for a 
particular data structure / data display combination. Such 
templates in conjunction with the User Interface Management 
System are able to automatically provide a majority of the code 
required in an interactive application. 

Introduction 

Within the graphics community the concept of a User Interface 
Management System (UIMS) has come into usage[TH083]. 
Such systems have been developed to overcome the high cost 
of implementing interactive graphics programs with quality 
human-computer interfaces. Most such systems have 
concentrated on the problem of input dialogue management 
[KAM83, GRE85, JAC83, VAN83]. Having developed three 
such systems in our laboratory[OLS83, OLS85a, OLS85b, 
OLS85c], we have become concerned with the problem of data 
display in an interactive program. In implementing interactive 
programs we have found that the input dialogue can be 
programmed in a matter of hours or days, using our tools, but 
the code to update the display after each modification to the 
application data structure takes months to implement 

In attacking this display update problem we approached it from 
the point of view of an intelligent display processor. Our 
original architecture for an interactive program is shown in 
Figure 1. 


Format 



Description 

Figure 1. 

In this architecture the inpift events are parsed according to the 
dialogue description and, based on the input, one or more of 
the application command procedures is invoked. It is the 
responsibility of these application command procedures to 
update some application data structure. It is the role of the 
display processor to create a graphical presentation of the 
application data according to the specified format. There are a 
wide range of possible formats for displaying the data which 
will only lightly be touched upon here. This architecture is 
somewhat similar to that proposed at the Seeheim workshop on 
user interface management [PFA85]. The key problem of 
interest in this paper is how the graphical image should be 
updated whenever the application data is changed. The obvious 
solution is to simply redraw the entire image. This is an 
extremely poor solution if one desires reasonable response 
time. 

After some experience with the above model we determined 
that a closer relationship between the data editing commands 
and the display update functions is essential. A more acceptable 
system architecture is shown in Figure 2. 
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Dialogue 

Description 


Figure 2. 

In this architecture a large portion of the interactive dialogue is 
viewed as a process of browsing and/or editing the application 
data structure. Based on this view, a description of the 
application data is taken together with a selection of browsing 
and editing methods to generate the dialogue definition, display 
update procedures and application command procedures for the 
selected operations. Note that this does not generate the entire 
interactive program but simply the editing portion which is the 
most display update intensive. Any analysis, file management 
or help facilities can all be added. In our experience it is the 
editing and browsing operations with dominate an interactive 
program. 


This paper will then proceed in the following fashion. First a 
data management facility called STUF (STrUctured Files) will 
be reviewed so as to define the possible data structures that 
might be represented in this UIMS model. This will be 
followed by a short description of MIKE (Menu Interaction 
Kontrol Environment) which is our input dialogue system. 
Having set the stage for our editing environment, editing 
templates will be described which provide the screen update 
facilities that we are interested in. 


STUF 

The STUF package was developed as a data model specifically 
for interactive applications. Our desire was to create a data 
model which would behave equally well in main memory or on 
secondary storage. We also imposed the requirement that 
STUF data must be accessible either relationally (as in a 
relational database) or as a linked structure (as with pointers in 
Pascal). The reason for this is that relational data models are 
very powerful but in many cases are too inefficient for practical 
graphics use. 

Each STUF file has a specific filetype which is defined by a 
set of datatypes . A datatype is defined to be either a Record or 
a Union. A datatype consists of a list of fields each of which 
has a fieldtype. A Record is defined to have all of the fields in 
the list. A Union is defined to have one of the fields in the list 
(as determined by a tag field at run time). A Union is similar in 
capability to a Pascal variant record. A fieldtype is either an 
Integer, Real, Char, Boolean or a reference to an object of 


some other datatype. A Field can also have dimension so as to 
create fixed length arrays. 

The links or references to other datatype objects pose a 
problem. In a relational model, tuples are linked to each other 
by matching key values. In programming languages this 
linkage is handled with pointers. The pointer solution provides 
the efficiency that we require but when pointers are written to 
secondary storage they lose all meaning. In addition the pointer 
model does not provide the flexibility found in the relational 
model. The STUF solution is to create a variable length array 
of tuples for each datatype in the file. A reference to a tuple is 
then stored as an index into this array. New and Dispose 
procedures are provided to allocate and deallocate tuples within 
a data type. In addition, the undeleted tuples in one of the 
arrays can be treated as a relation in the sense of a relational 
database to provide an associative access method for tuples. 
Since tuple indices do not lose their meaning when saved to 
disk or passed to another program we have our needed data 
facility. 

One more point to consider is that of cursors for editing. When 
one is editing it is usually performed by moving some cursor or 
current data object pointer. Editing and browsing operations are 
then performed relative to this current data object pointer. 
When editing a STUF data file, such a cursor is represented as 
a tuple index. Many of the editing templates described below 
will be defined in terms of such cursors. 

MIKE . 

MIKE is our dialogue handling system which is described in 
more detail elsewhere [OLS85c]. The important point to 
understand is how MIKE models the input dialogue. The basic 
interactive unit in MIKE is the command. A MIKE command is 
simply a Pascal procedure or function. MIKE accepts as its 
initial dialogue description a set of such procedures and then 
generates a compiled interface to these procedures. Interactively 
all procedures are presented in a menu and the selected menu 
item becomes the current command. Having selected a 
command, MIKE then prompts with a menu of all functions 
whose result type is the same as the type of the command's 
first parameter. These types can be any type from the 
application program. In addition to the functions in the menu, 
MIKE can supply primitive inputs for integer, real, string, 
function key and point types. MIKE continues accepting 
inputs until a complete command expression has been parsed. 
The command expression syntax is very similar to Pascal 
procedure invocation syntax with the exception that no 
punctuation, such as commas and parenthesis, is actually input 
MIKE's interactive use of functions and procedures is similar 
in many ways to Smalltalk's interactive use of methods 
[GOL83]. 

Given such a primitive interface, a profile editor can be used to 
improve it. The profile editor allows menus to be restructured, 
prompts and echos changed, icons drawn, function buttons 
mapped to commands and help texts written. This fleshes out 
and enhances the user's view of the interface but it does not 
change the underlying model of command procedures and 
functions. The profile editor serves a similar role to Buxton's 
MenuLay [BUX83] in editing external presentations of 
dialogues. 

This command model has proven to be much simpler to use 
than the state machine and grammar approaches that we have 
used previously. For the purposes of our editing model of 
interaction we can characterize all changes to the application 
data structure as Pascal procedures or functions. That is for 
each kind of change to be made to the data structure we 
generate a procedure which makes the change and performs 
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any necessary display updates. The generator then informs 
MIKE of the procedures' names and parameter types. From 
this information MIKE can create the necessary user input 
dialogue interface. 

Editing Templates 

The editor generation concept is based on the idea of editing 
templates. An editing template is designed as a presentation of a 
particular general class of data structures. Linked lists, symbol 
tables and trees are examples of such structure classes. An 
editing template then consists of a set of routines to do the 
following: 

a. display application data from a STUF file using the 

model, 

b. provide data structure traversal commands for 

browsing through the data image being displayed 
by the model, and 

c. provide the editing commands for creating, deleteing 

and/or modifying the data presented using the 
model. 

It should be noted that the commands provided must, in 
addition to performing their intended tasks, also update the 
display to reflect the results of their tasks. It is the display 
update which is most important to our discussion here. 

In addition to the services that an editing template provides, 
each template also has a set of parameters which are used when 
creating an instance of a template. In this sense an editing 
template can be thought of as a macro except that the 
parameters may be lists and other data structures rather than 
simple text to be substituted. 

In Smalltalk an editing template would be defined as a class. 
The services that it provides would be methods of the class and 
the parameters would be methods of the objects that the 
template is manipulating. In ADA an editing template could in 
most cases be represented as a generic package[GEH84]. Our 
work has been done in Pascal using a macro preprocessor but 
the concepts are the same. An editing template then is 
characterized by the data structure that it represents. An 
example is given below of an editing template for linked lists. 

Linke d-Lis t EdMn^Tgmpl ^t e 

As a first example of how editing templates function a linked 
list is appropriate. Since an editing template is meant to be a 
generic capability it must know a number of things about the 
linked list that it is to display. For example it must know how 
to find the head of the list, what field is used as the link for the 
list, how to display one of the elements of the list and how 
much display space to allocate to each element. This kind of 
information is provided by the following set of parameters. 
These parameters are all preceded by a percent sign so that their 
text is easily recognized in the generated routines. 

%ObjTvpe - The data type of the list elements to be 
displayed. 

%Link - The field that is used to link elements of 
ObjType together. 

%CurObi - The tuple index variable that is to be used as 
the cursor in moving up and down the linked list 
This is an index into ObjType's array. 
%HeadTvpe - The type of object where the head of the 
list is stored. 

%HeadField - The field in HeadType that points to the 
head of the linked lists being displayed. This field 
must reference data type ObjType. 


%CurHeadObj - The tuple index variable that references 
the HeadType tuple which contains the head of the 
current list being displayed. 

%XExt and %YExt - These contain the X and Y sizes of 
the screen space to be allocated to each displayed 
element 

%Window - This is the number of the window that the 
list is to be displayed in. 

%ObjDispf Objlndex; X,Y) - This is a routine to be 
called to have the list element referenced by 
Objlndex displayed on the screen at location 
(X,Y). 

%ObjDel( Objlndex) - This is a routine which will clean 
up and delete the list element reference by 
Objlndex. 

%EditSpace - This is the number of list element spaces to 
be left empty on the screen so that new elements 
can be inserted into the list without necessitating a 
repainting of the entire list. 


The various interactive tasks that one would want to perform 
on the display of a linked list would include moving a cursor 
up and down the list (scrolling the list if necessary), inserting a 
new element in the list and deleting the current element from the 
list In addition to these primitive operations one may also want 
to select a current item from the list using some other criteria 
such as a name. To accomplish all of these the template would 
provide the following command procedures directly to MIKE. 


% w jnd ow Up 

{ move the list cursor up one element scrolling if 
necessary } 

% W indowD own: 

{ move the list cursor down one element scrolling 
if necessary} 


{if, because of the size and shape of the window, 
the list is displayed in multiple columns then this 
will move the cursor left or right as the case 
may be and update %CurObj appropriately } 


{ if the list is longer than will fit in the window 
then this will move the cursor backward and 
forward through the list one page at a time} 
%WindowD elete: 

{ delete the current object pointed at by 

aCmOM} 


In addition to these command procedures which are exposed to 
MIKE the following additional service routines are generated. 


Restor e%Window 

{ This completely refreshs the window whenever 
the window itself changes or the value of 
^CmiHsadQfej-changes.} 

%WindowU pdateCur: 

{ This will simply update the display of the current 
object due to some modification to it by some 
other command } 


{ This will insert the specified object into the 
linked list immediately after %CurObj and make it 
the current object updating the display 
appropriately } 

%WindowC hangeCursor( Objldx) 

{ This will change %CurObj to the value of 
Objldx and update the screen appropriately } 


Note that all of the names of the generated routines are 
parameterized by %Window so as to make them unique. Note 
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also that there is no insert command exposed directly to MIKE 
because of the variety of ways that an application may want to 
create and initialize elements of the list. After such an element is 
created by some command procedure the %WindowI nsert 
service procedure can be called which will handle the list 
insertion and screen update tasks properly. 

Note that the actual display of an element is left up to an 
application supplied procedure. In many cases we have had 
lists of lists to display. This is handled with two windows, 
%W 1 and %W2 for example. %W1 displays the main list and 
can be browsed and edited as shown above. %Wl 's %CurObi 
cursor is also the ^CurHeadObi for %W2 . The %ObiDisp 
procedure for gflYl contains a call to Restore %W2 so that any 
change of the cursor in %W\ will cause an update of %W2 . By 
combining the linked list template with the various other 
templates a large number of data display and manipulation 
techniques can be implemented very quickly. 

Other possibilities 

The set of services provided above is not necessarily complete 
nor the only possible approach. For example the linked-list 
template could have a clipboard variable added as a parameter 
and then supply to MIKE the necessary commands to Cut, 
Paste and Copy list segments to and from the window's 
clipboard. An additional feature might be an element from the 
list with a mouse rather than scrolling through the list. 

Other data structuring mechanisms that we have implemented 
include unordered and sorted associative lists. Such lists are 
accessed by name or some other criteria. These two techniques 
view a given STUF datatype as a table of tuples for a relation 
and allow scrolling through and editing of such tables based on 
application supplied sort orders and selection criteria 

These structure editing templates are being combined with a 
forms editor which handles the element display functions. 
Other templates which we are still working on would handle 
network or schematic type displays such as is shown in Figure 


Figure 3. 

Such display forms are more difficult because of the layout 
connection routing aids that one would want to provide 
automatically. 

Summary 

Our User Interface Management System therefore consists of a 
command based dialogue editor called MIKE which provides 
quick easily learned prototyping of dialogues along with 
refinement of the dialogue via a profile editor. The interaction is 
viewed as an editing of a file in STUF format which has the 


expressive power of both relational and linked data structures. 
Display tasks are handled by formats which describe how data 
should appear along with editing templates which provide 
generic data editing and screen update procedures for various 
data display techniques. 

The main difference between this approach and other UIMS 
approaches is that the interaction is viewed as a data editing 
process rather than simply as an input dialogue parsing 
problem or a screen management problem. Having adopted 
this view one can then identify generic classes of display and 
editing techniques for various data organizations. Given such a 
class a template is developed which carefully links the input 
dialogue with the display management functions to provide 
vastly improved interactive response. Such a class can then be 
applied to specific data editing problems simply by binding the 
parameters. ° 

Assuming that one has an application whose data organization 
can suitably use the editing templates provided (which is an 
openended set) then an interactive browser / editor can be 
created in a matter of days rather than months. For those parts 
of such an application which do not match one of the existing 
templates new code can be written and easily integrated with 
the templates. In fact after such code has been written it should 
be examined for its potential to become a new template itself. 
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ABSTRACT 

The research reported here is focussed on the issues 
involved in automatically generating the presentation com¬ 
ponent of user interfaces. The design and implementation of 
the presentation component of the University of Alberta User 
Interface Management System are described. The system is 
used for automatically generating graphical user interfaces for 
interactive applications. The system has been designed to 
keep the other components of the user interface device 
independent, keep the designer's interest alive in the design 
process, make the design process less cumbersome, and 
reduce the burden of programming as far as possible. The 
results presented in this report are based on the experience 
gained through implementing a system to generate the presen¬ 
tation component of user interfaces automatically. The 
presentation component can be viewed as the lexical level of 
the user interface. 

1. Introduction 

There has been a growing awareness in software design 
of the importance of the user. This concern has manifested 
itself, for example, in analysis of desirable properties of user 
interfaces ICheriton76] and in investigations into the user- 
friendliness of interactive systems. The concept that the user 
interface can be treated as a separate module within the 
whole system, and not simply embedded at a range of points 
through it, is gaining acceptability [Edmonds81]. The effort 
now is to make user interfaces more interactive, graphic, for¬ 
giving, and self-explanatory. But, unfortunately, the construc¬ 
tion of good user interfaces even today remains an expen¬ 
sive, time-consuming, and often a frustrating process [Bux- 
ton83J. This prompted researchers in human factors to 
explore the possibility of automatically generating user inter¬ 
faces and the notion of a User Interface Management System 
(UIMS). This paper describes a tool for automatically generat¬ 
ing graphical user interfaces for interactive programs and 
explores the issues related to the process. 


1.1. What is A User Interface? 

The user interface is the part of a system that handles 
the interaction between the user and the other components of 
the system. In order to complete a useful task the system 
accepts inputs and presents outputs through the user inter¬ 
face. As more interactive systems of comparable functional¬ 
ity become available, their success in the market place is 
based increasingly on ease of use. Bad user interfaces often 
cause unnecessary loss of productivity and aggravation. Ease 
of use, not ease of implementation, has become the crucial 
design consideration. 

The basic structure of a user interface does not change 
radically over a wide range of applications [Green84a]. There 
are a number of functions that must be performed by most 
user interfaces. These functions include error detection and 
recovery, user protocoling, and undo processing. The con¬ 
cepts of a separate user interface module, separate interface 
designer, and the common features of the user interfaces 
have lead to the notion of UIMS. 


1.2. Automatic Generation of User Interfaces 

The fact that the basic structure of a user interface does 
not change radically over a wide range of programs and that 
functions like error detection, error recovery, and help are 
common to almost all user interfaces leads to the idea of 
automatic generation of user interfaces. The high cost and 
large turnaround time for hand coded user interfaces .pro¬ 
vides additional motivation for the idea. 

The automatic generation of the user interfaces has the 
following advantages: 

1) It reduces the cost of producing user interfaces. 

2) It provides a much shorter lead time than the hand coding 
ot the interfaces. 

3) The low cost and short lead time for the production of 
the user interfaces makes it possible to experiment with 
new ideas in user interface design. 

4) Once the user interface generator is debugged com¬ 
pletely, the software it generates is more reliable than 
hand coded software. 

5) A particular user interface generator may be used to 
generate a number of user interfaces which are con¬ 
sistent in their over all approach to functions such as 
error reporting and help. Familiarity with one such user 
interface can expedite the learning of the others. 

1.3. What is a UIMS? 

A UIMS is a collection of software tools supporting the 
design, specification, implementation, and evaluation of user 
interfaces [Seattle83j. It performs an important role of medi¬ 
ating the interaction between a user and an application; satis¬ 
fying user requests for application actions, and application 
requests for data from the user. It thus provides for the 
application programmer's problem specific skills to be con¬ 
centrated on the application, and freed from detailed concern 
with managing the flow of user actions and responses. 
UlMSs have also been called ’Dialogue Management Systems" 
[Roach82] or "Abstract Interaction Handlers" [Feldman82]. 
Over the past few years many models of UlMSs have been 
proposed and implemented [Newman68], [Kasik82], 
lGuest82], [Buxton83], [Jacob83], [Olsen Jr.83]. 

2. The University of Alberta UIMS 

The University of Alberta UIMS [Green85], [Sinah85], 
[Lau85], [Chia85] is based on the Seeheim model of user 
interfaces discussed in section 2.1. The design and imple¬ 
mentation details of the presentation component of the U, of 
A UIMS are described in this paper. Three main notations 
have been used for specifying the dialogue between the user 
and computer. These notations are: recursive transition net¬ 
works, BNF grammars, and events. A system accepting dialo¬ 
gues specified by recursive transition networks is discussed 
in [Lau85]. Details about an event language and its implemen¬ 
tation can be found in [Chia85]. At the present time the imple¬ 
mentation of a grammar based notation has not been started. 
Support for the application interface model is currently under 
development. 
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2.1. The Seeheim Model of User Interfaces 

In the Seeheim model of user interfaces [Green84b] a 
user interface is divided into three components as shown in 
Figure 1. The presentation component can be viewed as the 
lexical level of the user interface. This component is respon¬ 
sible for managing the input and output devices used by the 
user interface. All the interaction techniques and display for¬ 
mats are defined in this component. It reads the physical 
input devices and converts the raw input data into the form 
required by the other components in the user interface. The 
user interface employs an abstract representation for the 
input and output data. This representation consists of a type 
or name that identifies the kind of data, and the collection of 
values that define the data item. This chunk of information is 
called a token. 

While the presentation component is responsible for 
converting user actions to input tokens, the dialogue control 
component defines the set or legal input tokens. It interprets 
the sequence of input tokens produced by the presentation 
component to determine the operations the user wants to 
perform. Once a complete command has been formed from 
the input tokens the dialogue control component uses the 
application interface model to invoke the appropriate routines 
in the application. Similarly the output tokens sent by the 
application interface model are interpreted by dialogue con¬ 
trol and transformed into a format acceptable to the user. 
This component contains the control logic of the user inter¬ 
face. Most existing UlMSs have concentrated on this 

component of the user interface. 

The application interface model is a representation of 
the functionality of the application. It represents the user 
interface's view of the application. The application interface 
model contains the descriptions of the major data structures 
maintained by the application, and the application routines that 
can be invoked by the user interface. It also covers the mode 
of communication between the user interface and the applica¬ 
tion. The user interface may communicate with the applica¬ 
tion in one of the three possible modes of interaction: the 
user initiated, the system initiated, and mixed initiative. 

3. Design of the Presentation Component 

The basic job of the presentation component is to con¬ 
vert user interactions with the input devices into input tokens, 
and convert output tokens into images on the output devices. 
The basic structure of the presentation component of the 
University of Alberta UIMS is presented in Figure 2. In the 
following sections a description of the important concepts 
related to the presentation component of the University of 
Alberta UIMS is presented. 

3.1. Input Tokens 

The input tokens convey information about the user's 
interactions with the user interface to the other parts of the 
UIMS. The raw data generated by the user interactions with 
the input devices is manipulated and restructured by the 
interaction techniques and control module. This new struc¬ 
ture, called an input token, is sent to the other parts of the 
UIMS. An input token represents exactly one unit of informa¬ 
tion as far as the UIMS is concerned. An input token contains 
the following information. 

Token Number 

Token Value 

The token number is the unique number assigned to each 
type of input token by the presentation component of the 
UIMS. The interpretation of the token value depends upon the 
token number. 


3.2. Output Tokens 

The output tokens are used for generating images. 
These tokens are generated by the dialogue control com¬ 
ponent as well as the application, and are sent to the control 
module of the presentation component for further process¬ 
ing. The control module invokes the display procedure associ¬ 
ated with the output token and ensures that the image is gen¬ 
erated in the appropriate window. An output token contains 
the following fields. 

Token number 

Token value 

The token number is the unique number assigned to each 
type of output token by the presentation component of the 
UIMS. Using the token number as the key the control module 
finds the associated display procedure and the window name. 
The interpretation of the token value is left to the display pro¬ 
cedure. Usually this field points to a structure defined in the 
token definition file. The token definition file contains the 
definitions of the structures the value field of an input or out¬ 
put token could point to. 

3.3. Control Module 

The control module is responsible for all communication 
with the other parts of the user interface. This communica¬ 
tion includes sending the input tokens to the dialogue control 
component and receiving the output tokens form the other 
parts of the user interface. 

The other important function of the control module is to 
perform an external-internal mapping. This mapping deter¬ 
mines how the user's actions are converted into input tokens 
and how output tokens are converted into images. The 
external-internal mapping can be viewed as a dictionary used 
by the control module for interpreting user actions and output 
tokens. For input tokens the control'module uses the event 
number and the window name of the input event to determine 
the input token number. In the case of output tokens, it 
determines the window where the image will be generated 
and the display procedure to be used from the output token 
number. 

3.4. Library of Interaction Techniques 

Most interaction tasks are performed by a set of 
interaction techniques. An interaction technique is defined as 
a way of using a physical input device to enter a certain type 
of word (command, value, location, etc;), coupled with the 
simplest form of feedback from the system to the user 
[Foley81]. 

There are a large number of possible interaction tech¬ 
niques. Each interaction technique is suitable for a particular 
function. The set of interaction techniques available to a 
designer remains very limited if he/she has to develop one 
every time it is required. To make the designer's choice 
wider a library of interaction techniques can be created. This 
library can be used by any designer while deciding on which 
interaction techniques to use. Every time a new technique is 
developed it can be added to the already existing library. In 
this way one can keep building the library and help reduce the 
cost and time of producing good user interfaces. 

3.5. Library of Display Procedures 

A display procedure is a procedure that consumes out¬ 
put tokens. In the process of consuming output tokens the 
display procedure produces images on the graphics display. 
This image represents the data received in the output token. 
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Fig. 1. The Seeheim Model of a User Interface 
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Fig. 2. The Structure of the Presentation Component 


Each display procedure has a specific purpose and is used to 
generate a specific image. Examples of display procedures 
are Angle Display, Vertical Bar Display, and Text Windows. 

The idea of having a library of display procedures is 
similar to that of the library of interaction techniques. By 
adding new display procedures to the library a large body of 
procedures can be built. This library can then be used for fast 
and economical production of user interfaces. 

4. The Implementation 

The presentation component of the University of Alberta 
UIMS has been implemented on VAX 11 / 780 running UNIX 
4.2 BSD. The programs used for implementing the system 
are written in the programming language C. 

4.1. Structure of the Presentation Component 

The presentation component of the University of Alberta 
UIMS is responsible for the following activities. 

Screen management 
Information display 

Associating interaction techniques with windows 
Associating display procedures with output tokens 
Assigning unique token numbers to input and output 
tokens 

Converting user interactions into input tokens 
Converting output tokens into images 
Lexical feedback 

Adapting to different display devices, if possible 
An interactive approach to the design of the presenta¬ 
tion component of the user interfaces has been adopted in 
this UIMS. There are two steps involved in the complete 
design. The first step is the specification step. In this step 
the user interface designer interactively specifies the design 
information. This information is then used in the second step 
for generating the presentation component of the user inter¬ 
face and providing run-time support. To support both these 
functions the presentation component of the University of 
Alberta UIMS is divided into two logically independent parts. 
The first part, called "ipcs" (interactive presentation com¬ 
ponent specification), accepts the design specifications from 
the designer and generates a data base, token tables, and 'C' 
procedures. The second part, called "peg” (presentation com¬ 
ponent generation), consists of a number of procedures 
which provide run-time support for the presentation com- 

C onent of the user interfaces. This part is driven by the data 
ase and the token tables. The 'C' procedures produced by 
ipcs are compiled and linked with the peg procedures. The 
entire sequence of creating a presentation component is 

*UNIX is a trademark of AT&T Bell Laboratories. 
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shown in Figure 3. In implementing the presentation com¬ 
ponent a graphics package called WINDLIB [Green84c] and a 
data base package called FDB lGreen83] are extensively used. 

4.2. The Specification Step 

The complete specification of the presentation com¬ 
ponent involves the specification of the following com¬ 
ponents. 

Screen layout 

Input tokens and interaction techniques associated with 

windows 

Output tokens and display procedures 

Menu layouts 

The interactive specification program "ipcs" is used to 
enter the design information. The ipcs screen is divided into 
four areas as shown in Figure 4. The work area corresponds 
to the display screen of the user interface being designed. 
The designer positions windows and menus in this area. 
Above the work area is a text area used for help and error 
messages. The right side of the screen is used for the ipcs 
menu. An area across the bottom of the screen displays 
some of the attributes of the current window in the work 
area. In the following sections a brief description of the 
functions performed by ipcs is presented. 

4.2.1. Window Definition 

Ipcs starts off by displaying the layout shown in Figure 
4. At the start of the specification session the work area 
corresponds to the device window on which the user inter¬ 
face will be implemented. All the windows created by the 
user interface designer are children of this window. To start 
defining windows the designer selects the 'Window Defini¬ 
tion" command from the ipcs menu. A window can then be 
defined by pointing at its two opposing corners. Once a win¬ 
dow is defined it can be removed, stretched to a different 
size, or moved to a new position. An arbitrary number of 
windows can be defined at each level. 

4.2.2. Window Attributes 

The window attributes can be defined by selecting the 
'Window Attributes" command from the ipcs menu. This 
command assigns and displays default attributes for each win¬ 
dow in the work area. .The default attributes consist of the 
window name, window limits, background colour, drawing 
colour, and boundary colour. The value of an attribute can be 
changed by pointing at it and entering a new value. 

In this step an interaction technique can be associated 
with a window. An interaction technique can be selected 
from the library of interaction techniques, or it can be a pro¬ 
cedure written by the user interface designer. The name of 
the output token associated with the window is also specified 
in this step. On receiving this token the run-time support 
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module creates the window. It is important to note that it is 
the output token name, not the window name, which is used 
by the run-time support module. 

Ipcs is capable of handling windows of variable size. In 
a normal case, depending upon the size of the window, one, 
five, or ten window attributes are displayed at a time. In the 
case of one or five attributes, the next set of attributes can 
be displayed by moving the tracking cross inside the window 
and hitting carriage return. To be able to define or change 
attributes for a very small window, its size can be temporarily 
adjusted. The size of such a window is adjusted for the pur¬ 
poses of display only. 

The system also allows the use of overlapping windows. 
In the case of overlapping windows, the attributes displayed 
in one window overlap with the attributes in the other win¬ 
dow. These windows can be flipped by pointing at the 
desired window. The selected window temporarily becomes 
the top-most window and its attributes become visible. 

It is not necessary to complete the definition of all the 
attributes of a window at one time. The designer can post¬ 
pone the definition of all or some of the window attributes 
for a later time. The system does not force any pre¬ 
determined sequence of specification steps to be followed. 

The attributes for a window can be changed as often as 
desired. The system does not differentiate between changing 
a default attribute, or a designer-defined attribute. This facili¬ 
tates the interactive design of user interfaces. This mechan¬ 
ism for accommodating changes in the specification also 
helps in adapting user interfaces to individual users. The 
interaction technique or colours associated with a window, 
for example, can be easily changed to the actual user's liking. 

4.2.3. Menu Definition 

The "Menu Definition” command is used for defining 
menus. A menu is always associated with one of the win¬ 
dows defined in the work area. A menu consists of a menu 
header and a variable number of menu items. The menu 
header contains information affecting the appearance and 
location of the menu. This information consists of menu 
name, menu type, menu items placement option, menu orien¬ 
tation, and menu output token. Menu name is entered by the 
designer. A meaningful menu name can be used to remember 
the purpose or contents of the menu. The system provides 
facilities for fixed as well as pop-up menus. The default 
menu type is "fixed". The placement of menu items can be 
automatically handled by the system at the run time. The menu 
area is equally divided and allocated to each menu item in the 
menu. The system then centers the text and icons. The 
designer, however, has the option of specifying the location 
and size of individual menu items in the menu. The default 
menu items placement option is "system", it can be changed to 
"designer". The default menu orientation is "vertical", it can be 
changed to horizontal. Each menu is assigned an output 



Fig. 3. Sequence for Constructing a Presentation Component 
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token. On receiving this output token the run-time support 
module displays the menu. The run-time support module 
recognizes the menus by their output token, not by the menu 
names. 

A variable number of menu items can be associated with 
a menu. Each menu item occupies a rectangular area within 
the menu area. A menu item can be labeled either by an icon 
or text strings. A menu may consist of a mixture of iconic 
and textual menu items. For each menu item the designer 
specifies its type; iconic or textual. In the case of textual 
menu items one or more lines of text can be associated with 
the item. 

In the case of icons the system provides a library of 
icons. An icon may be selected from this library, or it can be 
produced by using an interactive editor, called ICON 
lGreen86], developed at the University of Alberta. To associ¬ 
ate an icon with a menu item the name of the procedure draw¬ 
ing the icon is entered. 

The system allows the designer to associate more than 
one menu with a window. This helps in creating a hierarchy of 
menus. It is important to note that the menus may be 
displayed in any order. It is not necessary to display the 
menus in the order they are defined. Therefore, though the 
menu definition hierarchy is simple, the menu display hierarchy 
can be as complex as desired. 

4.2.4. Input Token Definition 

Input tokens can be associated with a window by select¬ 
ing the "Input Token Definition" command from the ipcs menu. 
The input tokens convey information about the user's interac¬ 
tions with the user interface to the other parts of the UIMS. 
A variable number of input tokens can be associated with a 
window. An input token definition consists of token name 
and the associated event number. The system allows the 
designer to delete or add any number of input tokens during a 
specification session. 

4.2.5. Output Token Definition 

Output tokens can be associated with a window by 
selecting the "Output Token Definition” command from the 
ipcs menu. For output tokens the designer specifies the name 
of the token along with the name of a display procedure. On 
receiving the output token, the run-time support module 
invokes the associated display procedure. Based on the 
information contained in the output token, the display pro¬ 
cedure produces the image in the window in which the output 
token is specified. 

More than one output token can be associated with a 
window. Different output tokens are used to produce dif¬ 
ferent images in the same window. The system allows the 
designer to modify or delete an output token after its defini¬ 
tion. 

4.2.6. Next and Previous Level Definitions 

To be able to define the next level in the tree of win¬ 
dows, the designer can select "Next Level" command from the 
ipcs menu and point to a window in the work area. This win¬ 
dow now becomes the new parent. Some of the important 
attributes of this window are displayed in the bottom of the 
screen and the environment switches back to the one shown 
in Figure 4. 

A tree of windows can be created by using the "Next 
Level", 'Previous Level", and "Window Definition" commands 
from the ipcs menu. By selecting the "Next Level" command 
and pointing at a window in the work area, children of the 
window pointed at can be created. The work area 
corresponds to the selected window and child windows can 
be created using the 'Window Definition" command. 

The system does not require the designer to complete 
the definition of the current level before going to the next 
level in the tree of windows. The tree can be defined 
branch-by-branch, level-by-level or by a mixture of the two 
approaches. This flexibility in designing the tree allows the 
designer to work more methodically and concentrate on one 
part of the user interface at a time. The system does not put 
any limit on the depth of the tree or number of windows at a 
particular level of the tree. 

4.3. Novice and Expert Users of ipcs 

Ipcs has two levels of use; "novice" and "expert". A 
novice may use only the basic set of commands. In this mode 
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Fig. 4. The ipcs Screen Layout 


all the input is done through one button on the cursor puck, 
and all through the ipcs session the designer is guided by help 
messages. The help messages are quite detailed for the 
novice. In expert mode, the designer is allowed to use other 
buttons on the puck, and the messages produced by the sys¬ 
tem are terse. The use of extra buttons on the puck allows 
the designer to delete, resize, move, or temporarily change 
the size or priority of a window. A profile for each user of 
ipcs (called "userprof ile") is created and maintained by the pro¬ 
gram, initially tagging each user as a novice. The userprof ile 
file stores the status ("novice" or "expert") of the user, the 
number of times he has invoked ipcs, and the number of times 
the designer successfully finished the specification sessions. 
A novice is upgraded to an expert based on the number of 
successful executions of the ipcs. 

4.4. Output of ipcs 

The output produced by ipcs is used to drive the run¬ 
time support module, and provides the interface for the 
dialogue control component and the application interface 
model. The information produced for the run-time support 
module consists of an FDB data base and C' procedures, 
whereas the interface information for the dialogue control 
component and application interface model consists of tables 
of input and output tokens. A brief description of these files 
is presented in the following sub-sections. 
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4.4.1. Data Base Description 

The design information for the presentation component 
is stored in an FDB data base. An object in the data base is 
represented by a frame. A window frame points to the next 
window at the same level as well as to its child windows at 
the next level. In addition, a window frame points to the first 
menu, first input token, and first output token frames associ¬ 
ated with the window. Each of these in turn point to the 
frames providing detailed information about the objects. 

4.4.2. ’C' Procedures 

The 'C' procedures generated by ipcs are mainly used 
for passing parameters to the interaction techniques. The 
procedure calls are also used for loading the appropriate 
routines from the library of interaction techniques, display 
procedures, and icons. These procedures are compiled and 
linked with the other routines for run time support. 

4.4.3. File of Input/Output Tokens 

This file contains the names of all the input and output 
tokens along with a number. These numbers are assigned by 
ipcs to each of the input and output tokens. For reasons of 
efficiency this number is used for communication amongst 
various components of the user interface at run time. 

4.5. Run-Time Support Module 

The run-time support for the presentation component of 
the user interface is provided by the routines in "peg”. The 
following functions are supported by peg. 
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Receive the user's interactions in the form of WINDLIB 
events, reformat them into input tokens, and send these 
tokens for further processing. 

Receive output tokens from the dialogue control com¬ 
ponent, find the appropriate window and display pro¬ 
cedure, and call the display procedure. 

Display menus and highlight the selected menu items. 

The program peg is driven by the data base created by 
the specification program ipcs. It retrieves the information 
from the database and restructures the information as 
required. It also receives some help from the 'C code gen¬ 
erated by the ipcs. This code is compiled and linked with the 
other run time routines. Peg is divided into three logical 
parts. The first part is responsible for displaying menus and 
performing the associated bookkeeping. The second part 
handles the user interactions and generates the input tokens. 
The third part receives the output tokens and is responsible 
for their display. 

5. Conclusions 

In our system attention is focussed on the issues 
involved in the automatic generation of presentation com¬ 
ponents of user interfaces. The separation of presentation 
component from the dialogue control component helps 
designers work more methodically, and may therefore result 
in better user interfaces. The approach also overcomes one 
of the major stumbling blocks in user interface design, 
namely, the representation of geometric information in textual 
form. In our design most of the geometrical information is 
entered graphically. The system provides a window based 
environment which helps designers structure the user inter¬ 
faces in a more natural way. The system also provides facili¬ 
ties for creating and maintaining a hierarchy of windows and 
menus. The interactive design of user interfaces is supported 
by allowing the designer to move to the next level in the 
hierarchy without completing the definition of all aspects of 
the user interface at the current level. The system provides 
more freedom to the designer by not imposing any predeter¬ 
mined sequence of commands for creating user interfaces. 

The second contribution of this system is to show that 
all device dependencies can be limited to the presentation 
component of the user interface. If the user interface is 
moved to a different device only the presentation component 
needs to be changed. This increases the portability of the 
user interfaces. Also, the presentation component can be 
designed to support a range of devices, and automatically 
adapt to the one in use without changing the structure of the 
dialogue. 

The third contribution of this system is to show that user 
interfaces can easily be adapted to individual users. Screen 
layout, for example, can be easily tailored for left handed 
users. The selection of interaction techniques and display 
formats can also be easily changed to the actual user's liking. 

It is also observed that the existence of a separate 
presentation component encourages the use of a standard 
library of interaction techniques. This speeds up the process 
of generating user interfaces to a great extent and reduces 
the cost of programming considerably. This reduction in cost 
and time encourages experimentation with user interfaces and 
hence increases user satisfaction. 
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ABSTRACT 

■Die rotation of a digitized raster by an arbitrary angle 
is an essential function for many raster manipulation 
systems. We derive and implement a particularly fast 
algorithm which rotates (with scaling invariance) rasters 
arbitrarily; skewing and translation of the raster is also 
made possible by the implementation. This operation is 
conceptually simple, and is a good candidate for 
inclusion in digital paint or other interactive systems, 
where near real-time performance is required. 

RfeSUMfi 

La rotation d’un “raster” d’un angle arbitrage est une 
fonction essentielle de plusieurs logiciels de 
manipulation de “raster”. Nous avons impldmentd un 
algorithme rapide de rotation de “raster” conservant 
l’dchelle de l’image. Nous d*6crivons ce systfcme qui 
permet aussi le biaisage et la translation du “raster”. 
Cette operation, d’un concept simple, se rdv£le un bon 
candidat & l’insertion dans un logiciel de “paint system” 
(ou autre systeme interactif) oh une performance quasi¬ 
temps rdel est ndcessaire. 

Keywords: raster rotation, frame buffer, real-time . 


INTRODUCTION 

We derive a high-speed raster rotation algorithm 
based on the decomposition of a 2-D rotation matrix 
into the product of three shear matrices. Raster 
shearing is done on a scan-line basis, and is particularly 
efficient. A useful shearing approximation is averaging 
adjacent pixels, where the blending ratios remain 
constant for each scan-line. Taken together, our 
technique rotates (with anti-aliasing) rasters faster than 
previous methods. The general derivation of rotation 
also sheds light on two common techniques: small angle 
rotation using a two-pass algorithm, and three-pass 
90-degree rotation. We also provide a comparative 
analysis of Catmull and Smith’s method JCatm80] and 
a discussion of implementation strategies on frame 
buffer hardware. 

STATEMENT OF THE PROBLEM 

A general 2D counter-clockwise rotation of the 
point (x,y) onto (x',y') by angle theta is performed 
by multiplying the point vector (x,y) by the rotation 
matrix: 

cos 6 -sin 6 
sin 6 cos 6 

The matrix is orthogonal: it is symmetric, rows 
and columns are unit vectors, and the determinant is 
one. To rotate a raster image, we consider mapping 
the unit cell with center at location (i, J ) onto a new 
location (!',]'). 




Figure 1. Rotation by Raster Sampling 
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The image of the input cell on the output grid is a 
cell with (usually) a non-integral center, and with a 
rotation angle theta (9). We adopt a “box-filter” 
sampling criterion, so the value of the output pixel is 
the sum of the intensities of the covered pixels, with 
each contributing pixel’s intensity weighted in direct 
proportion to its coverage (Figure 1). Note that the 
output pixel may take intensities from as many as six 
input pixels. Worse, the output pixel coverage of 
adjacent input pixels is non-periodic; this is directly 
related to the presence of irrational values in the 
rotation matrix. Clearly, the direct mapping of a raster 
by a general 2x2 matrix is computationally difficult: 
many intersection tests result, usually with no 
coherence or periodicity to speed program loops. 


the half-angle identity: ot=7=-tan(0/2) • Program 
code to shear and update the point (x,y) with (x',yO 
is then: 

/# X Shear */ /* Y Shear */ 

x' := x - sin0 * y; x' := x; 

y' . y; y' := y + tan (0/2) * x; 

When the output vector replaces the input, x=x' 
and y=y', so the second line of the sequence may be 
optimized out. Consecutive shears yield sequential 
program steps. Thus, a three-shear rotation is achieved 
by the three program statements: 


ROTATION THROUGH SHEARING 

Now consider the simplest 2x2 matrices which 
may operate on a raster. These are shear matrices: 


X-shear = 


1 ot 
0 1 


Y-shear = 


1 0 
P 1 


Shear matrices closely resemble the identity 
matrix: both have a determinant of one. They share no 
other properties with orthogonal matrices. To build 
more general matrices, we form products of shear 
matrices - these correspond to a sequence of shear 
operations on the raster. Intuitively, consecutive 
shearing along the same axis produces a conforming 
shear. This follows directly: 


1 

a 


1 

a' 


1 

a+a' 

0 

1 

* 

0 

1 . 


0 

1 


Thus, shear products may be restricted to products 
of alternating x and y shears, without loss of generality. 
The product of three shears gives rise to a general 2x2 
matrix in which three arbitrary elements may be 
specified. The fourth element will take on a value 
which insures that the determinant of the matrix 
remains one. This “falls out” because the determinant 
of the product is the product of the determinants (which 
are always one for each shear matrix). Orthogonal 2x2 
matrices also have unit determinant, and may thus be 
decomposed into a product of no more than three 
shears. 


1 a 


1 0 


1 nr 


COS0 

-sin0 

0 1 


. p 1 . 


0 0 


sin0 

COS0 


Solving the general equation, we have 
ot='7=l-cos0 /sin0; p=sin0. The first equation is 
numerically unstable near zero, but can be replaced by 


x : = x + a * y; 

[i] 

y := y + p * x; 

[2] 

x := x + a * y; 

[3] 


With 0c=; 0, Cohen [Newm79]| uses steps [1] and 
[2] to generate circles by plotting points incrementally. 
His derivation begins be choosing a and p to 
approximate the conventional rotation matrix, and then 
points out that by reassigning x+cx*y to the original 
variable x in [1], and not to a temporary value x', the 
determinant becomes one, and the circle eventually 
closes. Our analysis demonstrates formally why this is 
true: rewriting the variables constitutes a shear, and the 
sequence of shears always maintains a determinant of 
one. Augmenting the code with line [3] would convert 
the two-axis shear into a true rotation: the circle 
generator would then produce points rotated through a 
constant angle relative to the preceding point. This is 
important should the algorithm be used to produce 
circle approximations as n-gons (and not point 
drawings), where 0=36O/n is no longer small. 

RASTER SHEARING 

Raster shearing differs from point transformation 
because we must consider the area of the unit cell 
which represents each pixel. Fortunately, the shear 
operation modifies the pixel location with respect to 
only one axis, so the shear can be represented by 
skewing pixels along each scan-line. This simplifies the 
intersection testing that must go on to recompute the 
intensity of each new output pixel. 



Figure 2. Raster Shearing Along the X Axis 
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In general, the unit square P(i,J) on row i is 
rewritten as a unit parallelogram with side of slope 1 /a 
on row 1, with the former displaced by ay pixel widths. 
This displacement is not usually integral, but remains 
invariant for all pixels on the ith scan-line. For 
illustration and implementation, it is represented as the 
sum of an integral and a fractional part (“£” in Figure 
3; “skewf” in Figure 6). Those pixels covered by this 
parallelogram are written with fractional intensities 
proportional to their coverage by the parallelogram. 
The sum of all these pixels must equal the value of the 
original input pixel, as they represent this input pixel 
after shearing. 

We next approximate this parallelogram of unit 
area with a unit square. Placing the edges of the 
square through the midpoints of the parallelogram, we 
produce an exact approximation when the 
parallelogram covers two pixels, but not when it covers 
three. This approximation is the basis for our rotation 
algorithm. As we shall see, it can be implemented as a 
very efficient inner-most pixel blending loop, thus 
offsetting the cost of making three shearing passes, as 
compared to previous techniques, which employ two 
less efficient (though more general) passes. 


a 

areas Identical 



Figure 3. The Parallelogram Approximation 

Based on this filtering strategy, we consider two 
approaches to rotation. First, we seek angles 0 for 
which the filtering is exact. Second, we analyze the 
filter for arbitrary values of 0 where the filter may not 
be exact. 

Filtering is exact when all parallelograms overlap 
no more than two pixels. This will always occur when 
the shear offset is of length l/n, as a periodic cycle of 
n parallelograms results, in which each spans exactly 
two pixels. Choosing this ideal filter for the first and 
third passes, we derive the second pass shear value. 
Setting a=l/n, we have 0=2- tan" 1 (l/n) and thus by 
manipulation of inverse trigonometric functions, 
p=2n/l+n 2 . Setting n=l yields a=-l, p=l and thus 
rotations with 0 =90 degrees are exact (not possible 
with the Catmull-Smith approach). Moreover, this 
shearing matrix never generates fractional values 
(implying no anti-aliasing must take place), so 90 
degree rotation may be coded as a three pass pixel 
“shuffle”. 


Other choices of n yield exact sampling in the two 
a passes, as a=l/n. Here p=2n/l+n 2 (in the middle 
pass) will never be of the form l/m, so some filtering 
artifacts will be present. However, we can form small 
rational values for a and p corresponding to various 
angular rotations, and create specialized filters, in 
which only the p pass generates small errors on a 
periodic basis. When a and p are small rationals of the 
form i/j, then the shear values (which are used as 
blending coefficients by our algorithm) will recur every 
] scan-lines, leading to customized algorithms. Solving 
for general rational values of a and p, we find that 
a=i/] and p=2ij/i 2 +] 2 . These tabulated values 
give rise to highly efficient filters, with approximation 
errors minimized: 


a 

p 

e 

-1 

1 

90.00 

- 3/4 

24/25 

73.74 

- 2/3 

12/13 

67.38 

- 1/2 

4/5 

53.13 

- 1/3 

3/5 

36.87 

- 1/4 

8/17 

28.07 

- 1/5 

5/13 

22.62 


Figure 4. Rotation by a Rational Shear 


We now consider arbitrary choices of 0, and then 
the precision of the rotation. For 0>9O degrees, our 
shear parallelogram may span four pixels, and the 
filtering rapidly breaks down. Based on the four-fold 
symmetry of a raster, we may restrict our attention to 
rotations of no more than 45 degrees, where our 
approximation has worst-case performance (because a 
and p grow monotonically with (K0<9O degrees). Here 
a=l-V2 «-.4142; and p=V£/2 «.7071. The 
second p pass is the most error-prone. 

Probabilistically, its filter is exact 29.3% of the 
time. Otherwise, the parallelogram spans three pixels, 
and the error, as a function of fractional pixel skew, 
grows quadratically to a cusp, reaching its worst-case 
error when the parallelogram is symmetric about the 
output pixel. This error is V2/8 or 17.7%. However, 
the sampling tile shifts as the shear value changes with 
each scan-line, so average degradation to the sheared 
raster is computed by integrating over parallelograms of 
all possible skew. Solving the equations, we find that 
the worst-case shear filter approximates intensities to 
within 4.2% of the actual intensity. For rotations less 
the 45 degrees, the approximation is even closer, as the 
the probability of the parallelogram spanning three 
pixels decreases. Where it does, the error terms are 
also smaller. 
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Figure 5. Approximation Error 

The nature of the error is to concentrate 
intensities from a center pixel whereas the true box- 
filter approximation calls for contributing coverages 
from two neighboring pixels. Thus, the approach 
“peaks” the data: the nature of the degradation is not 
random. Further, a reasonable implementation of the 
filter guarantees that when any scan-line is skew- 
sheared by a fractional amount, the contributing 
intensities of each input pixel sum to 1.0 — the filter 
parallelograms never overlap. If we consider the sum 
of the pixel intensities along any scan-line, this sum 
remains unchanged after the shear operation. Thus, 
the algorithm produces no visible shifts in intensity, and 
introduces no “holes” during rotation. The only 
rotation artifacts discernible appear with high-frequency 
data (such as lines of single pixel width), and even then 
only after magnification. This property is shared 
generally with rotation and translation algorithms which 
must resample such “sharp” rasters onto non-integral 
pixel boundaries. 

IMPLEMENTATION 

Scan line shearing is approximated by a blending 
of adjacent pixels. In the following code segment, the 
“pixmult” function returns a pixel scaled by a value 
skewf, where O^skewf<1, is a constant parameter for 
all “width” passes through the inner-most loop: 

PROCEDURE xshear(shear, width, height) 

FOR y := 0 TO height-1 DO 

skew := shear * (y+0.5); 
skew! := floor(skew); 
skewf := frac(skew); 
oleft := 0; 

FOR x := 0 TO width-1 DO 
pixel := P(width-x, y); 
left := pixmult(pixel, skewf); 

/* pixel - left = right */ 
pixel := pixel - left + oleft; 
P(width-x+skewl, y) := pixel; 
oleft := left; 

0D 

P(skewi, y) := oleft; 

0D 

Figure 6. Shearing Algorithm for X Axis 


This operation shears a raster of size (width, 
height) by the value present in “shear”, so the data 
matrix P must be of sufficient width to accommodate 
the shifted output data. Note that only “width” output 
entries are written, so the skewed output line may be 
written to frame buffer memory modulo the frame 
buffer pixel width, thus requiring no additional 
memory, but complicating the specification of data to 
the three shear passes. A virtual frame buffer 
implementation which provided a notion of “margins” 
to active picture detail can maintain this offset 
information implicitly. 

A shear operation always has an axis of shear 
invariance (it is an fact an eigenvector). In this 
implementation, the axis is the pixel boundary “below” 
the final row of pixel data at a distance “height”. This 
gives rise to rotation about the interstices between pixel 
centers. To rotate rasters about pixel centers, the “0.5” 
half-pixel offset may be removed. 

The code splits each pixel into a “left” and 
“right” value using one multiply per pixel; left and 
right always sum exactly to the original pixel value, 
regardless of machine rounding considerations. The 
output pixel is then the sum of the remainder of the 
left-hand pixel, plus the computed fractional value for 
the present (right-hand) pixel. The “pixmult” function 
reduces to a fractional multiply or table lookup 
operation with monochromatic images. More 
generally, it may operate on an aggregate pixel which 
might contain three color components, or an optional 
coverage factor j[Port84]|. Because read and write 
references to P occur at adjacent pixel locations during 
the course of the inner-most loop, pixel indexing can be 
greatly optimized. 

On machines lacking hardware multiply, code to 
shear a large (512x512) image may build a multiply 
table at the beginning of each scan-line, and then use 
table lookup to multiply. By skew symmetry, 
x-shearing of line -n and line n are identical, save for 
shear direction, so one table may be used for two scan¬ 
lines, or for every 1024 pixels. With a pixel consisting 
of three 8-bit components, the table length is 256, and 
table fetches will exceed table loads by a factor of 12. 
Since the table can be built with one addition per 
(consecutive) entry, its amortized cost per lookup is 
low, and decreases linearly with raster size. 

Framebuffers are beginning to incorporate integer 
multiply hardware, often targeted to pixel blending 
applications (The Adage/Ikonas frame buffers at 
Waterloo’s Computer Graphics Laboratory provide a 
16-bit integer multiply in hardware). This speeds the 
evaluation of the pixel blending; the majority of the 
inner-loop overhead is in (un)packing the 24-bit RGB 
pixel to provide suitable input for the multiplier. 
Fortunately, the addition used to complete the blend 
may be done as a 24-bit parallel add, because the 
values to be summed, “left” and “right”, have been 
scaled by frac and 1-ffrac respectively. Thus, the 
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blending operation is “closed”, and no carry can 
overflow from one pixel component into the next. 

Finally, the shear code may more generally be 
used to introduce spatial translation of the raster. By 
introducing an output offset in the shear code, a 
“BitBlt” [Inga78J style operation may be included at 
no extra cost. In this setting, “skewi” and “skewf’ 
would have integral and fractional offsets added to 
them to accommodate the lateral displacement of the 
raster. Displacement during data passes two and three 
provides arbitrary displacement on the plane, with 
orthogonal specification of the displacement parameters. 

More generally, when the code is incorporated 
into a larger package which provides arbitrary (affine) 
matrix operations on a raster, the composite of all 
intermediate image transformations are represented in 
one matrix. This avoids unnecessary operations to the 
image. Eventually, this matrix is decomposed into 
three operations: scaling, rotation and shearing (plus an 
optional translation if a 3x3 homogeneous matrix is 
used). The shearing, rotation and possible translation 
operations may be gathered into one three-shear 
operation. The scale pass prefaces the rotation if it 
scales to a size larger than 1:1, otherwise it follows the 
rotation. This maximizes image quality and minimizes 
data to the shear (and possibly rotate) routines. Other 
four pass scale/shear sequences are discussed in the 
literature flWeimSOJ. 

COMFAMSONS 

As with the Catmull-Smith approach, the 
algorithm may be implemented as a pipeline for real¬ 
time video transformation. Both approaches require 
two “rotators” to transpose the data entering and 
leaving the second scan-line operator, as this step 
requires data in column (and not row) order. 

In that approach, two scan-line passes (by x, then 
by y) are made upon the input raster. These may be 
modeled by the matrix transformation: 


1 0 


cos# -sin# 


X 

tan# cos2#sec# 


0 1 


. y . 


These slightly more general matricies perform a 
simultaneous shear and scale along one axis, while 
leaving the second axis unchanged. This approach 
saves one data pass, but incurs the penality of more 
complex scan-line sampling. 

Because sample pixels are both sheared and 
scaled, no pixel-to-pixel coherence of fractional 
sampling location exists. Thus, each pixel must be 
sampled at two fractional locations, doubling the 
number of pixel (aggregate RGB) multiplies for each 
pass. Hand analysis of our microcode showed that this 
is already the dominant operation in the pixel loop. 
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Further, the Catmull-Smith approach must additionally 
recompute the fractional sample points for each next 
pixel, or approximate their location using fixed-point 
arithmetic. In our implementation, fractional sampling 
points are constant per scan-line, and are calculated 
exactly in floating point at the beginning of each line. 

Compared generally to other work, our algorithm 
finds application where a generalized “BitBlt” operation 
is needed to perform rotation and translation efficiently. 
More complex pixel sampling passes may justify their 
added expense in allowing for generalize rotation 
operations, such as Krieger’s modified two-pass 
approach JKrie84] used to perform 3-D rotation with 
perspective transformation, useful in texture mapping. 

CONCLUSIONS 

The technique outlined here performs arbitrary 
high-speed raster rotation with anti-aliasing and 
optional translation. The mathematical derivation 
guarantees scaling invariance when rotating. The 
implementation strategy allows for particularly fast 
operation, while minimizing the approximation error. 
Tliis algorithm is a powerful tool in the repertoire of 
digital paint and raster manipulation systems. Coupled 
with state-of-the-art raster scaling techniques, it can 
transform an input raster by an arbitrary 2x2 
transformation matrix in near real time. 
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ABSTRACT 

We present a comprehensive formal model for a 
computer paint system providing capabilites beyond 
those of traditional designs. The system incorporates 
an alpha channel to enable artwork to have variable 
opacity in a manner reminiscent of “cel painting.” 
Operations which may be performed on these RGBA 
images include digital painting, airbrushing, erasing, 
masking, and image compositing. These are 
implemented as instances of the digital compositing 
algebra introduced by Duff and Porter. Our 
implementation model extends a proposal by Tanner, et 
al. It is cost-effective and is based on the concept of a 
virtual frame buffer containing a higher-level 
description of the image being painted and an 
associated output transformation that maps the contents 
into a standard RGB frame buffer used only for 
viewing. Ways of implementing the model to take 
advantage of multiprocessing capabilities in various host 
and frame buffer architectures are discussed and three 
implementations are examined. 

Keywords: brush, cel, digital compositing, mask, 
multiprocessor, output transformation, paint, virtual frame 
buffer, RGBA. 


RfiSUMfi 

Nous pr6sentons un module formel ddcrivant un logiciel 
de palette de couleurs eiectronique (“paint system” 
offrant des possibility allant au del& des modeies 
traditionnels. Ce systdme comprend un canal alpha qui 
permet au dessin d’avoir une opacite variable 
ressemblant h la technique d’animation appelde 
gouachage de cellos. Les operations pouvant £tre 
executes sur ces images RGBA comprennent: dessin 
digital, airbrushing, effapage, masquage et composition 
d’images; ces operations sont impiementees suivant 
Talg6bre digital de composition d’ecrit par Duff et 
Porter. Notre modeie poursuit une idee de Tanner, et 
autres, et se revfcle peu couteux. 11 est base sur le 
concept d’un “frame buffer” virtuel contenant une 
description de “haut niveau” du dessin et d une 
transformation qui lui est associee. Celle-ci transforme 
cette description en un format RGB servant 
excluisvement & l’affichage sur un “frame buffer” 
ordinaire. Nous discuterons des fapons d’impiementet 
ce modeie suivant les possibilites d’execution en 
paralieile de differents ordinateurs et “frame buffer” 
Trois implementations seront analysees. 

The first author's current address is: National Film Board 
of Canada, Studio A. French Animation, Box 6100, Station 
A, Montreal, Quebec, Canada H3C 3H5, Tel: (514) 283- 
9309. 
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INTRODUCTION 

The first computer paint system was written soon 
after the first frame buffer came into existence. A wide 
variety of styles and techniques have been implemented 
since then for a diversity of hardware configurations. 
This paper will introduce a formal model of a paint 
system based on an artist’s conceptual model similar in 
many respects to traditional cel animation techniques, 
but extended to capture new degrees of freedom 
available to animators through the use of digital 
computers. 

The work reported here is part of a joint project of 
the Computer Graphics Laboratory and the National 
Film Board of Canada. In consultation with members 
of the NFB’s French Animation Studio in Montreal, the 
goal is to build a production-quality paint system. The 
system has been designed and two prototypes have been 
implemented. Since the french translation for “paint 
program” is palette de couleurs electronique , the system 
has been dubbed Palette. 

Palette is intended not only for painting 
backgrounds but also for direct animation. In direct 
animation, an image or physical model is changed 
incrementally and re-photographed to create each 
successive frame [LAYB79]. In order to reduce the 
effort required in this labour-intensive process. Palette is 
designed to combine the direct animation potential of a 
typical paint program with the advantage of cel 
animation: composite images whose component parts are 
re-usable to save duplication of effort in those parts of 
the image that remain constant from frame to frame. 
In cel animation, this is of course achieved by painting 
each part of the image on a separate sheet of 
transparent acetate called a “cel.” 

Palette operates upon “digital cels,” that is, 
RGBA images [PORT84J, that may have been painted 
with the program, digitized from photos or hand-drawn 
pictures, or produced by other computer graphics 
rendering techniques. In this respect, its functionality 
(though by no means its performance) is similar to the 
Pixar Compositor [LEVI84]. 

The notion of a virtual frame buffer with an 
associated output transformation is central to the 
implementation model. Tanner et al. [TANN83a]] 
have described the speed and potential cost advantages 
of implementing RGB paint programs using a virtual 
frame buffer as a. cache in host memory separate from 
the hardware frame store used for viewing the image. 
Frame buffer values need never be read back to the 
host and the frame buffer hardware has much less 
stringent speed and depth requirements. [TANN83a]] 
tends to present the concept as a better way of 
implementing existing applications. The paint system 
described here is an implementation of that proposal 
which enjoys the benefits cited, but it demonstrates that 
another aspect is also important. Separating a 
“working” description of the image (in a virtual frame 
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buffer) from the viewing of the image (in a display 
frame buffer) affords the opportunity of extending the 
model of a paint system well beyond the set of features 
supported by today’s hardware architectures. 

The following sections present details on the 
artist’s conceptual model (a brief “user’s manual” for 
Palette), the formal model of a virtual frame buffer that 
instantiates the artist’s conceptual model, the techniques 
used to implement the augmented brushing styles 
proposed in the conceptual model, and a brief 
discussion of some implementation issues that arise 
when the formal model is mapped onto specific graphics 
hardware (in this case three specific multi-processor 
configurations that are being used as prototype 
implementations of Palette). 

THE ARTIST’S CONCEPTUAL MODEL 

This section provides an overview of Palette as 
seen by its intended user, an artist. The workstation 
layout (Figure 1) consists of a tablet with stylus as the 
primary input device, a colour monitor on which the 
image being rendered is previewed, and an 
alphanumeric terminal with keyboard. The monitor 
displays the current image and a set of menus for 
selecting operations. In the prototype, additional 
commands and parameter specifications are made 
through the use of the alphanumeric terminal. A more 
comprehensive tablet-based menu system is planned for 
the production system. 



Figure 1. The Workstation Layout for Palette. 


Central to the conceptual model of Palette is the 
notion that the work surface on which the artist draws 
or paints is a transparent plane called a cel. The term 
comes from the acetate layers used in animation which 
were at one time made of “celluloid.” In conventional 
2 1/2-D cel animation, each frame is created by 
photographing a stack of cels laid upon opaque 
background artwork on an animation camera stand 
|MADS69J. Because cels are transparent except for 
areas where they have been painted, the photographic 
process results in a composite image. Employing this 
cel concept in a paint program permits creation of 
images by composition* and provides artwork with the 
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additional property of variable opacity. The former 
property facilitates not only computer animation but 
also a digital form of the “layout and paste-up” process 
which is fundamental to graphic arts. Figure 2 
illustrates the artist’s conceptul model of painting with 




Figure 2. The Artist’s Conceptual Model. 

The image visible on the monitor during a Palette 
session is the composition of two planes, the foreground 
and the backdrop. The foreground has variable opacity 
while the backdrop is opaque. The foreground level is 
initially transparent. The artist paints on the 
foreground level as though it were a “cel.” The 
backdrop may be loaded with a uniform colour or an 
arbitrary image (including a composition of previously 
painted cels). The composition of the foreground “cel” 
level and the backdrop is much like the effect of 
placing a single cel over a painted background in 
conventional animation — where the foreground has 
maximum opacity only it is visible, where the 
foreground has zero opacity the backdrop is fully 
visible, and for intermediate opacities the backdrop is 
partially visible through the foreground. 

The prime motivation for the backdrop image is 
the need for something to be visible wherever the 
foreground is transparent. In addition, while the 
foreground cel is being painted, the backdrop can be 
used to hold a reference image for sequence registration 
(e.g. the previous pose of a cartoon character) or for 
context (e.g. a background matte painting). 

Palette provides only full-colour (24-bit) fully 
antialiased brushes that have smooth edges and variable 
opacity determined by brush specifications under the 
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control of the artist. Essentially, a brush is just another 
variable-opacity raster image, typically smaller than the 
image being painted. The painting operation itself is a 
sequence of applications of the brush to the foreground 
image using one of a variety of compositing formats to 
blend the two. Each brush has five orthogonal 
attributes called shape, stroke, colour, density, and 
operation. The artist creates his own brush by assigning 
attributes to each of the five properties. These 
attributes determine the effect of applying the brush to 
the foreground image. 

The shape property refers to the two-dimensional 
region of pixels (not necessarily connected) affected by 
a single imprint of the brush. Palette provides a variety 
of standard antialiased square and circle brush shapes 
automatically. Alternatively, the user may paint an 
arbitrary shape to be used as a brush. 

The stroke property determines the relationship 
between the motion and pressure applied to the stylus 
and application of the brush to the cel. When the 
stroke property has the attribute “stamp,” each press of 
the stylus causes a single composition of the brush with 
the foreground image. The attribute “repeat stamp” 
causes a succession of brush composites at a constant 
rate independent of the speed of stylus motion, thus the 
gap between brush imprints increases with the speed of 
the stroke. The “continuous” attribute produces a 
continuous antialiased stroke without the gaps of the 
“repeat stamp” stroke. The attribute “straightedge” is 
similar to “continuous” but always produces a straight 
stroke between positions indicated by momentary 
presses of the stylus. A sequence of such strokes is 
terminated by pressing the stylus at (or very near) the 
same position twice in succession. 

The colour property is the set of colours, one for 
each pixel within the brush shape, which is composited 
with the foreground image. The brush may be a single 
colour (the same at each pixel within the shape) or 
multiple colours. 

The density property is a weighting function that 
determines the extent to which a single application of 
the brush alters the image within its imprint. During 
painting, density represents the opacity at each pixel 
within the brush shape. A maximum density causes the 
colour of the brush to entirely overwrite the cel colour, 
whereas a zero density leaves the cel unchanged. 
Intermediate values cause a blending of the brush and 
image colours using the density as a weighting factor. 
Densities can be created automatically using constant, 
Gaussian, or cusp functions centered at the origin of 
the brush. A Gaussian density function, for example, 
produces an effect closely resembling conventional 
airbrush. 

The operation property specifies one of four modes 
of brushing: paint, erase, mask , or mask-erase. Paint 
mode uses the brush density to control the blending of 
the brush with the image. 
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Erase mode is a unique feature of Palette and 
reveals part of the power of the underlying cel model. 
In this mode the colour of the brush is ignored, but the 
density is used to decrease the opacity of the 
foreground image each time the brush is applied. 
Thus, where the brush has maximum density, the 
foreground image becomes absolutely transparent, 
exposing the backdrop beneath, whereas repeated 
application of an intermediate density gradually 
“fades” artwork so that the backdrop becomes 
increasingly visible through it. A zero density erase 
brush leaves the foreground unchanged. 

Mask mode is analogous to graphic artists’ 
conventional practice of temporarily masking areas of 
artwork by means of paper, masking tape, or frisket to 
protect them from subsequent painting or airbrushing. 
Again, the colour of the brush is ignored, but the 
density of the brush determines the permeability of the 
mask associated with each pixel in the foreground 
image. A maximum-density brush creates an 
impenetrable mask. During subsequent painting or 
erasing the effect of a brush will be reduced according 
to the degree of mask present at each pixel. Masked 
areas may be set globally visible or invisible by the 
artist. If visible, presence of a mask is signified by a 
specified global mask colour. 

Mask-erase mode functions similarly to erase 
mode, but operates on the mask rather than on the 
foreground cel. It is used to reduce the permeability of 
a mask or to do away with it entirely. 

Palette provides functions for clearing, loading, 
and saving the foreground, backdrop, and mask 
portions of an image, for merging the foreground and 
backdrop images, and for undoing the most recent 
operation applied to the image. Images are stored in a 
file format that permits a number of other tools in use 
at Waterloo to be used on images created with Palette. 
One is a general image manipulation package 
[PAET85]] and another is a general compositing 
package [KLAS85J implementing the full set of 
operations proposed by Duff and Porter [PORT84]]. 

This section has provided an overview of functions 
provided in the current implementation of Palette. A 
more complete description of the artist’s view of the 
final system is contained in the functional specification 
[HIGG83J. 

THE VIRTUAL FRAME BUFFER 

This section discusses the actual implementation 
of that conceptual model. The key idea which is 
introduced in Palette is the notion of a virtual frame 
buffer in which the foreground cel and the backdrop, as 
well as the mask, can be represented. On page 29 of 
their text JFOLE82J, Foley and Van Dam note that the 
traditional design philosophy of paint programs has 
been that the image being painted is precisely the image 
being displayed on the monitor — unlike the rest of 
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computer graphics, no other representation of the image 
is maintained. They term this philosophy “what you 
see is what you get.” This approach has advantages in 
terms of performance but limits paint programs to 
functionality easily supported in available hardware. 
Under this philosophy, if its conceptual model is to be 
taken seriously, implementation of Palette would 
require a substantial investment in custom hardware. 
In addition to “on the fly” compositing hardware, the 
frame buffer would require a substantial number of bit 
planes in order to store the mask, the foreground cel, 
and the backdrop. 

The model presented here breaks with the time¬ 
worn “what you see is what you get” approach to paint 
program design. It is based on a cost-effective 
approach that separates the painting function from the 
viewing function by introducing a virtual frame buffer in 
which all painting operations are performed and an 
output transformation that maps the data in the virtual 
frame buffer into a form suitable for the video-refresh 
circuitry in a conventional frame buffer. The virtual 
frame buffer need not be accessible to the video output 
(in particular, it need be neither dual-ported nor 
accessible at standard video rates in excess of 100 or 
even 25 nanoseconds per pixel). Instead, it can be 
stored in memory that is more readily accessible to 
stroke rendering routines (either in cpu memory of the 
host or workstation or in a portion of the physical frame 
buffer not required for video refresh). Image data in 
the virtual frame buffer is not subject to the dictates of 
the video-refresh circuitry and can be stored in 
whatever format is most efficient for painting 
algorithms. By employing a higher-level description of 
the image and formally defining the compositing steps 
that transform that representation to viewable RGB 
values placed in the display frame buffer, we adopt the 
approach that has served the rest of computer graphics 
so well for so long. We are able to make significant 
simplification in the programmer’s view of the paint 
program while also freeing ourselves from the 
limitations of particular display hardware. 

A virtual pixel in Palette consists of 64 bits of 
information (Figure 3), 24 bits for each of the 
foreground and background images, 8 bits for the 
foreground opacity, 7 bits for the mask, and one 
additional contrast flag that is used to implement 
temporary feedback images [NEWM79J such as 
position markers, grid guidelines, and bounding boxes. 

Palette* s output transformation defines the 
mapping from these 64 bits to a standard R-G-B 
representation of each pixel. The first step is to 
composite the cel and backdrop images according to 
Wallace’s formulation [WALL81]]. If the artist has 
indicated that masking is to be visible, the second step 
is to check whether the pixel’s mask value is non-zero 
and, if so, to composite the global mask colour over the 
result of the first step using the mask density value as 
the opacity. To ensure that visible masking does not 
overly obscure artwork beneath it, the mask density is 
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Figure 3. The Virtual Pixel Used in Palette. 


scaled so that it its opacity never exceeds one half. If 
the pixel’s contrast flag bit is set, the final step is to 
apply a contrasting function to the R-G-B value to map 
it to a contrasting value that will distinguish the pixel 
from its neighbors. 

Assuming (R,G,B) is an R-G-B pixel value 
whose components are each in the normalized range 
[0,1], the contrasting function usually used in raster 
graphics is (1,1,1) - (R,G,B) [NEWM79J. This is 
efficiently implemented as complementation of the bit¬ 
wise representation for the pixel. This effectively 
complements the hue of a colour. For cases in which 
the pixel colour is highly unsaturated and mid-intensity, 
however, this function produces little apparent change. 
The worst case is a {axel value of (0.5,0.5,0.5). In the 
course of formulating our model, Tanner |TANN83b]| 
suggested an alternative function to change a colour by 
a consistent amount regardless of its original value. 
The function is (R,G,B) + (0.5,0.5,0.5) modulo 1 and 
is efficiently implemented by complementing only the 
high-order bit of each component value. 

Palette’s cels are based upon the full-colour digital 
representations of cels and backgrounds for animation 
described by Wallace [WALL81]]. In addition to 
R-G-B values, Wallace stores an opacity value which is 
a compact approximation of the the edge information at 
each pixel. Wallace’s formula for associative pair-wise 
composition of cels reduces the total number of 
compositing steps in a sequence of frames by allowing 
cels that remain adjacent from frame to frame to be 
pre-merged. Duff and Porter JPORT84]) introduce a 
compositing algebra in which Wallace’s formula is only 
one of a dozen operations which go beyond what is 
possible with traditional cels. They call Wallace’s 
“opacity” values “alpha” values and store their images 
in a different format in which each of the R-G-B 
components is pre-multiplied by the alpha value. Duff 
[DUFF85J has extended that model to include a notion 
of z-depth. 

Storing the brush and foreground cel as R-G-B 
values pre-multiplied by their opacities as recommended 
by Duff and Porter would require allocating additional 
bits for each channel in order to avoid severe roundoff 
error in the course of brush compositing. Our 
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experiments indicate that at least twelve bits would be 
needed for each of red, green and blue. Rather than 
substantially increase the storage required in the virtual 
frame buffer, the cel R-G-B values are retained in their 
unsealed form. 

The reason originally advanced for storing R-G-B 
as pre-multiplied values is that the compositing 
operations can be performed more quickly in that 
format JPORT84]]. Palette uses an alternate 
formulation of compositing due to Hardtke that realizes 
the same results with almost the same efficiency, while 
permitting R, G, B, and opacity to be stored as 
separate 8-bit values [HARD85J. 

The final issue concerning the virtual frame buffer 
is the frequency with which the output transformation 
must be applied. Conceptually, the virtual frame buffer 
is continually being transformed from its 64-bit internal 
representation to its 24-bit representation in the physical 
frame buffer. This is not possible because of the 
computational bandwidth required. The practical 
approach is to perform the output transformation at 
intervals on only those virtual pixels which have been 
modified. The details of this are very dependent upon 
the particular architecture upon which Palette is 
implemented. Examples are discussed in Section 5. 

BRUSHING TECHNIQUES 

The brushes used in Palette are relatively 
complicated objects. For efficiency a brush record is 
maintained that defines the five properties shape, 
stroke, colour, density, and operation. While stroke 
and operation are easily preserved as simple scalar 
values, the shape, colour and density information 
requires more elaborate data structures. Square and 
round brushes would be easy to handle, but rather than 
cater to special cases, brushes are kept in a general 
linked list data structure whose goal is to save storage 
and speed brushing by avoiding pixels that are not 
affected by the brush. 

Painting operations within the virtual frame buffer 
are implemented in a straightforward manner because 
the foreground cel and the backdrop are stored 
separately and only the foreground cel or the mask is 
modified by brushing. To implement the four types of 
operations, paint, erase, mask, and mask-erase, the 
brush, the mask, and the cel are treated as images 
which are combined in various ways using Duff and 
Porter’s compositing algebra. Using the terminology of 
[PORT84]], the paint operation is simply the 
compositing operation “(brush out mask) over cel.” The 
erase operation is “cel out (brush out mask).” The mask 
operation is “brush-density over mask” and erase-mask 
is “mask out brush-density.” As has already been 
mentioned, because Palette does not pre-multiply the 
R-G-B components of its RGBA images by “A” 
(alpha) as in [PORT84]], it uses a slightly different 
formulation of these compositing operations [HIGG86]]. 

Vision Interface ’86 








87 


IMPLEMENTATION ISSUES 

Practical issues remain to be addressed, such as 
how and when to update the display frame buffer to 
reflect modifications to the working copy in the virtual 
frame buffer. Such questions rely on the particular 
hardware chosen for the implementation. We thus 
conclude with a brief overview of three particular 
architectures on which versions of Palette have or will 
be implemented. 

To date, versions of the system have been 
implemented on two hardware configurations and are 
planned for a third. The first implementation, a 
feasibility study, was on a Norpak VDP-1 frame buffer 
attached via a DMA link to a VAX 11/780 host 
computer running VMS. A Motorola 68000 
microcomputer was attached to the VDP-1 as a 
dedicated user-programmable display processor. Figure 
4 gives a schematic diagram of the system. This 
equipment is located at the National Research Council 
of Canada. 

In this implementation the entire virtual frame 
buffer and the undo buffer are located in the host 
VAX. The VAX is responsible for tablet sampling, all 
operations on the foreground cel and the backdrop, and 
the output transformation. The 68000 is responsible for 
maintaining tracking and writing R-G-B values into the 
frame buffer. Both processors are programmed in C. 


“visited” multiple times by the Tenderer. The 
bottleneck in this process is data transfer from the 
VAX to the VDP-1. In order to minimize the amount 
of data transferred, the VAX performs the output 
transformation once on pixels in the “wake” of the 
brush, that is, pixels that the Tenderer has finished 
“visiting.” For a given brush shape, the eight possible 
wake patterns are pre-computed. Each of these define 
those pixel positions in one imprint of the brush shape 
that are unaffected by a second imprint offset from the 
first by one pixel position. (See Figure 5.) After each 
imprint of the brush shape into the virtual frame buffer 
in the course of rendering a stroke, the VAX performs 
the output transformation on the pixels in the wake 
pattern corresponding to the direction offset between 
the current imprint and the one before it. The resulting 
RGB values, the current imprint position, and a 
number identifying the wake pattern used are all 
transferred to the graphics processor which writes the 
pixel values into the frame buffer at the appropriate 
positions based on its own copy of the specified wake 
pattern. At the end of the stroke, the full brush shape 
is used rather than a wake pattern. Figure 5 shows the 
display buffer updates required for an example stroke. 
The cross-hatchings indicate the wake pattern that 
affects each pixel in the stroke. 

A noteworthy feature of this implementation is 
the fact that the VDP-1 at the National Research 
Council has only seventeen bits per pixel. The actual 
image displayed on the monitor is generated using the 



Figure 4. The VAX/VDP-1 Architecture 


Because the entire virtual frame buffer is simply 
an array residing in host memory, many of the 
bottlenecks associated with traditional host-based paint 
systems are overcome. Of particular importance is the 
fact that the VDP-1 is used in a “write-only” manner; 
the values stored in the hardware frame buffer are 
never read back during painting. This is fortunate 
because, like many commercial frame buffers, the 
VDP-1 does not easily support the operation of read- 
modify-write on a single pixel. This is the essence of 
the inner loop of any paint program and it must be 
made to execute efficiently. 

In the VAX/VDP implementation, a stroke is 
rendered by rubberstamping the brush at each pixel in a 
straight line between each pair of sampled tablet 
locations. Fishkin calls this the Naive algorithm 
JFISH84]] and notes that the excessive overlap of the 
brush imprints causes each pixel in the stroke to be 
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high five bits of red, the high seven bits of green, and 
the high four bits of blue. One bit is reserved for 
tracking feedback. Although this is inadequate for the 
final image, it is sufficient for viewing the image during 
its creation. The full-precision version of the image 
stored in the virtual frame buffer is the useful product. 

This first implementation proved the soundness of 
the artist’s conceptual model, but left much to be 
desired in the way of performance. Reasonable 
response was precluded by Floating-point compositing 
code, time-sharing the VAX, virtual memory paging by 
the operating system, and a 2-millisecond overhead for 
each system call transferring data to the frame buffer 

The second implementation is on an 0rca3000 
workstation comprising a MC68000 cpu running Unix, 
a custom bit-slice graphics processor, and a 1024-line 
8-bit frame buffer equipped with colour lookup tables 
[ORCA83J. Figure 6 is a schematic diagram of the 
system. The workstation used for Palette has 4 
megabytes of host memory. Its frame buffer memory is 
only addressable by the graphics processor. The 
graphics processor is programmed in C using a cross- 
compiler [GURD85a]|. 



Figure 6. The 0rca3000 Architecture. 

Virtual memory paging by the operating system is 
not a concern in this case. The workstation’s main 
memory is ample and its operating system does not 
support virtual addressing. 

An intriguing aspect of the 0rca3000 is the 
manner in which inter-processor communication and 
data memory for the graphics processor are provided. 
The graphics processor has an interface to the 68000’s 
system bus and can access any location in host memory 
directly through the use of base and offset registers. An 
additional command!status register is used to 
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communicate with the graphics processor without 
requiring it to generate contention on the 68000 system 
bus. 

Work is more equitably allocated between the two 
processors in the Orca implementation because the 
graphics processor assumes the burden of performing 
the output transformation. Custom microcode written 
in C performs the output transformation, the tracking 
function, and menu-handling on the graphics processor. 
The 68000 host performs all other functions, including 
maintenance of the virtual frame buffer. Rather than 
using the Naive algorithm to render strokes, the 68000 
uses a more efficient algorithm which visits each pixel 
only once. This approach uses Gupta and Sproull’s 
antialiased line rendering algorithm [GUPT81]] to look 
up appropriate values in Fishkin’s “Sweep arrays 
which contain pre-convolved opacities for a stroke 
[[FISH84J. 

The “wake” patterns employed in the first 
implementation of Palette are discarded. Frame buffer 
updates are instead performed by means of a paging 
scheme whereby the virtual frame buffer is divided into 
rectangular blocks that are marked whenever they are 
modified. The output transformation is periodically 
applied to those blocks that have been marked since its 
last application. 


Paging the image to the display frame buffer 
speeds communication between the two processors and 
reduces the amount of data transferred. Thq virtual 
frame buffer is split into blocks of 16x16 pixels. Each 
block has a corresponding dirty bit that is set by the 
68000 host whenever it modifies pixels in that block. 
After setting dirty bits, the 68000 also sets the value of 
the command/status register in order to signal the 
graphics processor which otherwise busy-waits on the 
register to avoid saturating the 68000’s system bus. 
When alerted, the graphics processor checks the dirty 
bits to find each block requiring the output 
transformation. Bits are checked in round-robin 
fashion to avoid looping on blocks that change 
frequently to the exclusion of the rest. Before starting 
the output transformation, the graphics processor resets 
the block’s dirty bit to avoid race conditions with the 
host. While the 68000 continues painting, the graphics 
processor accesses the block in host memory directly, 
performs the output transformation, and writes the 
resulting pixel values into the display frame buffer. 

In the Orca implementation, there is a mismatch 
in resolution between the 512x512x24-bit image 
produced by the output transformation described in 
Section 3 and the 1024xl024x8-bit frame store which 
must display it. To overcome this problem, an 
additional step involving digital halftoning , is added at 
the end of the output transformation to map each 24-bit 
RGB colour to a 2x2 array of 8-bit Orca pixels. In 
essence three bits of each pixel are allocated to a 
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halftone version of the red portion of the image, three 
to the green portion, and two to the blue portion. The 
halftone patterns are out of phase with each other to 
avoid moire effects and the values in the colourmaps 
are gamma-corrected for better antialiasing. 

Trading spatial-resolution for intensity resolution 
in this manner is equivalent to an additional two bits in 
each of the red, green, and blue channels. Thus, the 
displayed image is effectively 512x512x14 bits. This 
scheme has worked very well. In side-by-side 
comparisons with full 24-bit images, differences are 
difficult to discern at normal viewing distances. Again, 
a certain amount of discrepancy is acceptable due to 
the distinction between previewing images during 
painting and the ultimate resolution required for 
photographing or video-recording finished artwork. 
These results support the contention of Tanner, et al 
that the virtual frame buffer approach would permit 
construction of less expensive “24-bit” painting stations 
employing hardware frame buffers having fewer than 
24 bitplanes. 

The third implementation is not yet underway, 
but is worth considering briefly because it complements 
the first two approaches. The VAX/VDP-1 
implementation performs almost all of the calculations 
on the host with the frame buffer serving only for 
viewing. The 0rca3000 implementation offloads all of 
the output transformation to the graphics processor, 
while maintaining the virtual frame buffer within the 
host. A proposed implementation for the Adage/Ikonas 
RDS-3000 frame buffer will move even the virtual 
frame buffer to the graphics processor while 
maintaining only the basic tablet routines and high-level 
control in the “host” 68000 cpu. 

The reason for this is that the RDS-3000 supports 
a full 32-bit pixel and has a 1024x1024 display 
memory. Because only one fourth of that is needed for 
the viewing image, the other three-fourths can be used 
to store the entire virtual frame buffer. A 32-bit 
custom bit-slice (similar in many respects to Orca3000’s 
16-bit graphics processor) has sufficient computing 
power and high-bandwidth access to the display 
memory that we expect to be able to perform both the 
basic painting algorithm and the output transformation 
on the bit-slice without using the 68000 that is attached 
to the frame buffer. 

Because both the 68000 and the bit-slice are 
programmed in C (the bit-slice has a similar cross- 
compiler [GURD85b]]) we hope to move much of the 
program from the Orca3000 to the RDS-3000 with little 
modification. 
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ABSTRACT 

Raster manipulation software is often viewed as an ad 
hoc means to fine-tune the appearance of digital 
images, or as a means to reformat them to conform to 
specific hardware requirements. A universally 
accepted, machine readable, device-independent 
specification of a raster image is seldom employed. 
This stands in contrast to the variety of “standards” for 
higher-level scene representation. We define a general 
raster “type”, which unifies the design of a toolkit of 
raster-based software. Operations performed by the 
tools are closed in the sense that operators map objects 
having the raster type onto new objects having the 
raster type. This closure encourages a synthesis of 
function by allowing composition of operators. 
Sequences of these operators are surprisingly powerful 
and have wide application. 

RESUME 

Les logiciels de manipulation d’images “raster” sont 
souvent consid&rds comme un moyen ad hoc 
d’amdliorer l’apparence d’images digitales, ou comme 
un moyen de les modifier de fapon k ce qu’elles se 
conforment k un appareil sp^cifique. La representation 
universelle d’une image “raster”, ne dependant pas 
d’une machine particuliere, est rarement utilisee; ce qui 
contraste avec le grand nombre de normes qui existent 
pour representer des images de plus haut niveau. Nous 
definissons un type “raster” qui permet la creation 
d’une serie d’outils operant sur celui-ci. Les outils en 
question forment un ensemble ferme dans le sens qu’ils 
opdrent sur des images de type “raster” pour produire 
des images de type “raster”. Cette fermeture permet la 
creation de fonctions par la simple juxtaposition 
d’operateurs plus simple. Ces compositions de fonctions 
se revdlent etonnament puissantes et ont un vaste 
domaine d’applications. 

Keywords: bitmap, digital compositing, imaging, raster. 
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INTRODUCTION 

The ultimate goal of any software system should 
be the creation of a harmonious set of tools in which 
each tool embodies a conceptually simple operation. 
This is true for the case of raster image manipulation, 
but such a set is not in widespread use. To have 
generic utility, each tool must operate on an abstract 
raster type. For instance, a “cropping” tool should trim 
rasters regardless of their dimension or pixel attributes. 
Additionally, the tool’s output should be, in all cases, a 
valid raster file so that tools may be composed 
arbitrarily. 

To achieve this, we define a universal file format 
and implement general raster access routines. With 
these, the creation and coding of each new tool is 
greatly simplied, and the proliferation of disposable 
software can be alleviated. This scenario is also a boon 
to the user: generic tools imply a simple conceptual 
model. In some cases, they even suggest new ways of 
“plumbing” together raster operators. This approach is 
appealing in an academic/research environment, where 
creative experimentation is encouraged, but where 
software maintenance remains on a tight budget. 

This paper discusses the design and 
implementation of a comprehensive raster manipulation 
system, based on the raster file format, that has been 
operational for over a year and is the mainstay of 
raster-based activities within the Computer Graphics 
Laboratory at the University of Waterloo. In that time, 
it has completely subsumed the various ad hoc raster 
file formats previously in use and has provided a 
unifying framework for new research. 

OVERVIEW 

The toolkit contains programs to support abstract 
operations (rotation and scaling), as well as interfaces 
to a number of hardware devices and software systems. 
These include I/O tools for Adage/Ikonas and Raster 
Technologies frame buffers, the Apple/Macintosh, and 
Versatec and Imagen hardcopy printers. In this usual 
setting, tools model the UNIX text/filter paradigm, 
whereby the output of any one tool may be piped 
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directly to the input of the next. This is particularly 
important when intermediate raster files may be quite 
large. 

The tools are written in standard C, use no 
assembly code or specialized C packages (such as yacc), 
and have been ported successfully to machines with 
different word length and severe compiler restrictions. 
Conditional compiler code is used to represent the 
specifics of byte order. This allows the system to 
maintain both a uniform presentation of data for the 
low-level tool builder and an identical external 
representation (as a byte stream) for disk files. 

Most users are not tool writers, but use the raster 
tools freely in a conceptual fashion. An artist working 
on the Macintosh might print bitmap files on the 
Imagen laser printer, or use them as picture input for 
the comprehensive Orcatech-based Palette [Higg85]] 
colour painting system. Here the user may disregard 
the different pixel precisions, (lack of) colour, or 
machine word-lengths, all of which differ for each 
hardware system. Because the file format has been 
designed carefully, it has become the format for 
exchange as well as for archival storage. 

We begin by identifying and evaluating the design 
criteria first for the underlying raster format, then for 
tools. The paper concludes by demonstrating the 
synthesis of a new function (digital halftoning) through 
application of the atomic tools. 

BACKGROUND 

The widespread availability of digital raster 
devices has spawned a large progeny of “raster 
formats”, often with no unifying design principles. It is 
not uncommon for a format to represent a digital 
“dump” (on external media), patterned after a device’s 
(or program’s) internal data structures. It can be 
argued that this raster format is finely tuned to the 
hardware characteristics of its respective device, with 
subsequent advantages in terms of run-time efficiency. 

Our findings do not support this argument. 
Rather than allowing the specifics and availability of 
hardware or software to drive our choice of design, we 
make an a priori raster file design, and then argue its 
advantages. We begin by identifying useful design 
criteria for both the file format and for the general 
software system in which it is employed. 

In contrast to some proposed formats, our design 
philosophy has been toward a universal file format 
which is “moderate” in providing sufficient attributes to 
model any display device, but without any unnecessary 
or redundant file attributes: it is minimal. This 
philosophy encourages the construction of tools which 
embody raster functions in as abstract a setting as 
possible. As a direct consequence, tools which deal 
with only a subset of valid raster files will not exist. 
This approach is a major departure from many other 
raster systems. 
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DESIGN OF THE FILE FORMAT 

Raster Specification and Operation 

No raster specification is present that might be 
ambiguously interpreted as a raster operation. Thus, 
“width” and “height” are essential raster specifiers; 
raster “orientation” is not, because the raster rotation 
function exists as a tool, and thus is not (by design) a 
specification. As a consequence, the raster rotation 
code belongs to a single tool, which aids in software 
maintenance. This model frees the user from the 
dilemma over choice of representation and tool 
application. In previous systems, a custom tool (e.g., a 
laser output tool) might accept rasters of only a specific 
orientation, based on speed considerations. 
Alternately, that output tool might provide a high-speed 
implementation of rotation independent of a raster 
rotation tool. In the first case, the user is left with a 
question of specification to guarantee operation of the 
printing tool. In the second, the locus of code which 
provides raster rotation is not well defined. 

Some scene representation languages take the 
opposite extreme. Functional specification is allowed in 
the most general sense. Here “tools” don’t exist per se\ 
their function is present in the interpreter which reads a 
file. An example is the Xerox Interpress standard 
JSpro81J. Here the general implementation of the file 
format on a printer implies the existence of supporting 
code to perform rotation, rendering and all other 
operations potentially specified by the document. 
Because this interpreter is monolithic, operations are 
not free-standing programs. Thus, integration of new 
software is difficult for a diverse software community. 

In our experimental setting, we do not advocate 
that our raster file allow for “programming” in this 
sense: we envision situations where one function might 
be applied to many sets of data, and vice versa. Non¬ 
radical manipulation of rasters capitalizes heavily on 
this separation, and we insist on it. In our setting, our 
well-defined files embody the raster data, and a toolkit 
of machine-executable files (or UNIX shell scripts) 
embodies the raster operations. 

Pixel Specification 

The format provides for the formal specification 
of a pixel, which allows generic tools (such as crop or 
rotate) to operate on arbitrary data sets, with 
independence both from raster dimension and pixel 
specification. The importance of this should not be 
underestimated. Historically, formats allow for a 
maximum of three .or four pixel components (often not 
even within the same file, but as “separates”). Pixel 
precision is usually taken from a small set, such as one, 
eight, and twelve bits. Experience has shown that it is 
impossible to predict a priori what or how many 
attributes comprise a pixel — new models are not to be 
discouraged. Besides the obvious RGB colour 
components, traditional data sources often carry multi- 
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spectral data or Z-depth information. The last few 
years have brought “alpha” coverage factors and sub¬ 
pixel masks to the forefront. 

For instance, the Orcatech-based Palette system 
treats pixels as a sixty-four bit quantity, by encoding 
both foreground and background primary colour 
information, plus other parameters including masking 
and transparency, thus modeling artist’s use of paint. 
Our format embraces this experimental system easily. 
It is worth noting that the system has both a different 
word length and integer byte order than the original 
VAX implementation, but this is entirely invisible to 
the tool creator. Images created by Palette may be 
moved using standard tools (UUCP) to the VAX and 
rendered on VAX-based graphics engines. 

Syntactically, we define pixels as collections of 
“fields”, used to identify the components, in a manner 
analogous to the record structure type in languages such 
as Pascal. The pixel attribute is recorded in the file 
header as a text string. Pixel components consist of an 
alphabetic identifier and an associated integer which 
defines the field precision (up to thirty-two bits per 
component). The identifier is occasionally used 
externally to specify pixel components to certain 
software tools. For example, imextract merges and 
extracts pixel components from multiple input files into 
an output raster, based on the user-specified field set. 
The precision component is rarely presented to software 
tools, as the low-level routines allow correct arithmetic 
operation across files of differing pixel precision (but 
usually with conforming field names). This is a 
function unique to our package, and enhances general 
compositing of files from diverse sources. 

Interpretation of Pixel Data 

The interpretation of the data fields is at the 
user’s discretion. In many cases, data is taken to span 
the closed interval [0..1]. This interval is closed 
under multiplication and complementation. The low- 
level tools provide a data presentation level which 
returns floating-point values for pixel components on 
the range [0. . 1], so the actual field precisions can be 
kept invisible. This interval is consistent with the 
design of a number of colour spaces such as RGB, 
CIELAB and HSB [Smit78]], in which the three 
independent colour axes are placed within the interval 
[ 0 . 0 .. 1 . 0 ]. 

Unfortunately, many software tools in existence 
wrongly (often implicitly) use the interval [0..1). 
The latter follows directly when software employs bit 
shifts to map between pixels with differing numbers of 
significant bits. In that model, a one bit pixel image 
(to take the worst case, albeit a very common one), 
[0..1) allows only the intensity values 0.0 and 0.5. 
When taken to higher significance, the binary value . 1 
becomes . 1$. This system never allows “full-on” to be 
represented. 


A useful mapping has two important properties: 
reconstruction and representation . Reconstruction 
means that data can be mapped into any higher 
precision, and when subsequently mapped back to the 
original precision, reconstructs the original data exactly. 
It is not hard to see that bit shifts are lossless operations 
and therefore have this useful property. Representation 
means that pixel data of lower precision can be mapped 
to a system of higher precision, with the pixel values 
mapping exactly onto identical intensity values. In 
general, perfect representation is not possible when 
moving to higher systems, but it can be achieved in 
many cases, while providing reconstruction universally. 

The proper approach regards the interval as being 
of length 2 n -l. In general, our mapping always 
provides exact values for intensities 0.0 and 1.0, so 
our interval of representation is the closed interval 
[0 ..0.. 1.0]. Note that binary (one bit) data in our 
system represent 0.0 and 1.0 exactly. Adoption of 
this system means replacing bit shifts (multiplies and 
divides by 2 m ) by general multiplying and dividing. 
This is not a severe speed penalty. In practice, a 
scaling table can be constructed and a lookup operation 
used to find the appropriate mapped value. Our 
method also provides for reconstruction, because a scale 
up of one bit provides 2n+l new bins, where n existed 
before, and uniform distribution means that no two 
values collide. 

Exact representation is possible whenever 
whenever m is a factor of n. To illustrate this, 4 is a 
factor of 12, so we assert that four bit data has an 
exact representation in a twelve bit system. To prove 
that 2 4 -l is a factor of 2 ia -l, express them as bit 
streams: ‘111111111111’ can be divided by *1111’ 
giving ‘000100010001’, or 273. Thus, 
4095=15*273, and the representation for white is still 
exact. More generally, multiplying any value in the 
four bit system by 273 yields exact representation in the 
twelve bit system. 

Textual Header 

Another departure from many “standard” raster 
file formats is the exclusive use of case-independent, 
human-readable text within the header. The use of 
small “binary” headers with magic word values is still 
common. Yet in raster files, the header typically 
constitutes less than 1% of the total storage. The 
advantage we gain is a parser made common to all user 
software (and thus is part of the low-level raster 
primitives). Because our header is minimal, this is a 
simple task. Direct viewing of the attributes of a file 
means merely viewing the first few lines of it — no 
special tool is used. 

Because both the header and raster data are 
represented by a byte stream (giving machine 
independence), we mandate that no “alignment” 
specifications to the raster be made — the physical 
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raster immediately follows the textual header. 
Experimentation with UNIX-based systems indicates 
that non-alignment to disk boundaries makes almost no 
difference to software throughput, particularly where 
the blocking size on disk transfers is large. 

The representation of our header data structure in 
human-readable ASCII text is a trend increasingly 
common in good software practice. The design of the 
highly-successful CIF2.0 by Sproull and Lyon JHon80] 
as a VLSI exchange format mandated use of ASCII to 
allow electronic mailing of design geometries. The 
format has gained widespread acceptance outside this 
realm, as it can be implemented easily on machines of 
differing character representations and word precisions. 
Sproull previously designed the Xerox AIS [Baud77J 
raster format (replete with binary header information), 
and now argues convincingly [Spro83| that this trend 
toward textual representation should be universally 
adopted, even where the need for exchange is of 
secondary importance. 

Compact Representation 

Archival storage of raster images relies on data 
compaction. Because we desire a universal format 
requiring no explicit (de)compression steps, our basic 
format must provide for compression as part of the pixel 
specification, and implement this operation as part of 
the basic access routines. 

After two attempts at general data compression, 
we chose a “compaction” operation, which operates on 
a level beneath pixel specification, and immediately 
above the level of physical data movement to external 
media. Our compaction scheme is a general run length 
coder, which replaces identical runs of n bytes with n+1 
bytes of code, representing the original run, plus a 
count in the range [1. .256]. We choose the term 
“compaction” over “compression”, as the operation 
may take place without regard to pixel boundaries. 
Early experiments indicate this may have value where 
images containing data that is cyclic across a scan-line 
(halftoned images, stipple patterns) are to be encoded. 
The compression size can then be set to span a 
collection of adjacent pixels. 

General and Special Cases 

The raster proper is encoded in a manner which 
maximizes speed of raster (un)packing by aligning pixel 
groups onto regular boundaries. Although the specifics 
are detailed, this underlying code guarantees packing 
efficiencies of more than 84% for pixel sizes up to 12 
bits, approaching 100% for arbitrarily large pixels. This 
design choice minimizes overhead in the data extraction 
loop, as the shift and mask values are constant over the 
data set. 


The criteria set forth above allow “special cases” 
to fall out directly from the more general specification, 
without soecial caveats being coded in. This is 
intentional. For instance, the external representation of 
an 640x480 size raster of twenty-four bit RGB pixel 
values is quite simple: a textual header, followed by 
640*480*3 bytes of data, arranged in R,G,B order, by 
scan-line, without any padding. Although we don’t 
advocate that tools write rasters independently of the 
low-level software which defines the header 
specification, it does indicate the simplicity and 
generality of our approach. For instance, a videotex 
station could dump out a hard-coded header string, 
followed by a byte dump of its screen contents thus 
creating a well-formed “cannonical” raster format file. 

TOOL DESIGN 

General Philosophy 

Brooks’ findings [Broo75] show that as a rule a 
long-lived systems consist of software which outlives the 
intention of its original use. Because we cannot 
anticipate the user’s ultimate needs or goals with the 
raster tools, we should choose to craft each tool into an 
atomic, composable function with no implicit 
assumptions of the user’s intentions. Frpm this 
“metaobjective”, a number of practical considerations 
immediately become clear. 

Consistency of Design 

Overall design consistency leads to tremendous 
ease of use for both implementor and user. In 
particular, the time spent in learning the tools used for 
simple operations becomes proportional to the user’s 
objectives; what little “start-up overhead” exists is 
common to all tools, and need not be relearned! 
Similarly, the tool designer can fashion a new tool 
based on the existing package, thus implicitly inheriting 
uniformity of the user interface (such as commonly 
used command line switches) and operation. 

As an example of consistency, all software tools 
dealing with the concept of an axis-aligned rectangle 
specify this entity in terms of origin and size, not as 
diagonally opposed comers. In contrast, comer-based 
specifications leave the ambiguity of semi-open intervals 
for the user to resolve. For instance, the comer 
specification model might describe a 512x512 display 
as spanning the region (0,0) to (511,511), whereas 
our model describes the display as a window of size 
(512,512) with origin location (0,0). Thus, we 
remove the burden of potential “off by one” errors; in 
fact the casual user will probably be unaware that any 
ambiguity could exist. This “correctness by design” is 
a very powerful concept in implicitly steering the user 
along a correct path of conceptualization. 
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Minimal Atomic Set 

The tools are atomic, com posable functions which 
deal with raster data in the most abstract way 
conceivable for each respective function. They strongly 
resemble Guibas’s concept of a bit map calculus 
[Guib82J with the accompanying language MUMBLE. 
Our implementation provides for pixels of arbitrary 
precision, as do his; our “language” consists of UNIX 
commands in which tools play the part of keywords to 
perform manipulations on the data. A compiler for a 
large subset of MUMBLE using the toolkit as “machine 
code” would be a straightforward exercise. 

Just as computer languages advocate a small 
number of composable keyword constructs, we 
encourage the user to synthesize function from tools 
already within the kit. When this fails, he should seek 
the most general tool necessary to extend the coverage 
of the tool set to contain this specific operation. Besides 
allowing for a new function with the least amount of 
new software, a minimal addition to the toolkit can be 
very revealing to the deep structure of the problem. 

THE TOOLS 

Although space does not permit a description erf 
each tool, we may summarize the operation of the 
toolkit. It is useful to characterize classes of tools by 
common operation. These form our taxonomy. 

Storage Considerations 

Because tools are composable, they operate on 
data presented serially. However, some tools require 
internal raster storage to perform their intended 
operation. We classify these as “level 0” (constant 
pixel storage needed), “level 1” (constant scan line 
storage needed), and “level 2” (arbitrary storage). In 
each case, these are worst-case raster storage 
requirements. A generous number of tools belong to 
classes 0 and 1. Sequences of operators may therefore 
manipulate images on secondary storage whose size 
exceeds system main memory. 

Input/Output Characterization 

Because tools are operators, we may classify them 
as “unary”, “binary” or “tertiary” tools, based on the 
number of simultaneous inputs used. Two additional 
classes: “source” and “drain” represent tools which 
interface between the toolkit universe and some other 
means of representation. These include display output 
tools, text input tools and pattern generators. With the 
exception of drain tools (these typically render images), 
a tool will have one “standard” output in the form of a 
universal image file. This class characterization is 
formally specified in the source code of each software 
tool, thereby activating library routines common to all 
tools within this class. Such code governs the number 
of expected input files and enables command line 
switches generic to that class. 
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Other Characterizations 

A final characterization is whether a tool 
preserves pixel integrity. Most tools do, and this is 
important when a tool is used to manipulate non¬ 
intensity pixel fields, particularly when the tool is used 
in settings far removed from traditional imaging 
applications. Cropping can be performed on data which 
contains Z (depth) information, but a low-pass filtering 
of such data is not intuitive because the latter does not 
preserve pixel integrity. At present, few tools which 
violate pixel atomicity exist. Pixel-preserving functions 
fit in well with our design philosophy of generic tools. 

TOOLS BY EXAMPLE - HALFTONING 

The following examples demonstrate a series of 
experiments used to perform digital halftoning (the 
creation of bi-level images) from high resolution 
sources. The presentation is an idealized “lab session”, 
but it is also a recapitulation of the historical 
development of the tools. 

Experiment #1 — Plate #1 

We begin by halftoning through simple 
thresholding. We envision thresholding as a binary tool 
which does a test for magnitude of its two inputs. The 
source tool imeonst is used to provide a reference 
level for the secondary input. Generally, thresholding 
can be modeled as a subtract operation, with a 
subsequent test to map x>0 values into white, else 
black. This last function already exists as lmtomask. 
We thus perform the test a>b as (a-b)>0 stepwise on 
the image file containing a “milkdrop”: 

imeonst -d milkdrop.im -v 128 | 

imsubtract milkdrop | lmtomask >out 

Experiment #2 — Plate #2 

Thresholding to a uniform, constant value 
produces poor (as expected) images, so we write a 
program imhalftone to do ordered dithering, 
comparing input pixels against a cyclic, periodic set of 
dynamic threshold points to vary the thresholding 
|Jarv76]|. The program is not clean: the 4x4 matrix of 
threshold weights is hard coded, and the software must 
permute the array internally to conform to the arbitrary 
widths and heights of the input file. 

imhalftone milkdrop.im >out 

Experiment #3 

The halftone results look good, but we need 
avenues of further exploration. The permuted internal 
weight table replication code is in fact a “tile” 
operation first conceived for use with much larger 
images. We write imtile. As a bonus, the tiler is 
fast, and includes offset switches to be fully general. 
These correspond to phase shifts in the halftoning dot, a 
desirable feature in colour digital halftoning, in which 
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halftone screens for successive colour separations are 
staggered with respect to previous screens. The 
threshold table is now recoded as the file kernel. lm. 

Imtile kernel.im -w 128 -h 128 I 

imsubtract milkdrop I imtomask >out 

Experiment #4 

The 4x4 kernel, now represented as kernel.lm 
in Experiment #3 was cumbersome to make. We had 
also planned software which would to do a “text dump” 
of raster files; here we wish to do the opposite: convert 
the “dump” into a raster representation. Realizing that 
the scope of this software is more than merely one of 
diagnostic service, we write both Imtabin and 
lmtabout. The hard coded ASCII constants of 
imhalftone have now found a niche. One can 
envision a standard tool sequence (e.g. a UNIX shell 
script) to halftone arbitrary images against a textually 
encoded table of weights. 

Imtabin -w 4 -h 4 -p n8 >kernel.im 
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Experiment #5 — Plate #3 

The typing of constants in Experiment #4 
suggests a mechanized means to generate random 
numbers, and we are curious to see the appearance of 
such output. The creation of an array of random 
numbers (not specific to halftoning) is all that is 
needed: the other code is already in place. We rewrite 
of copy of lmconst which substitutes random values 
for constants. The -default switch in our example 
borrows the dimensions and pixel specification of 
milkdrop. lm, so that imrandom can produce 
conforming output. The output shows superimposed 
high-frequency noise [Robe62J, resembling the grain in 
film emulsions “pushed” too far during development. 

imrandom -d milkdrop I 

imsubtract milkdrop | imtomask >out 

Experiment #6 — Plate #4 

The results inspire the use of a thresholds with 
Gaussian distribution. We recall that the “coin 
tossing” method [Kalb79J generates such sets by 
averaging small sequences of evenly distributed 
numbers. This requires binary operators other than 
subtract (such as average and sum), so we extend the 
scope of imsubtract. The tool is renamed lmaop 
because it now allows for arbitrary arithmetic operations 
from two input sources. We also rediscover that 
rerunning imrandom generates identical values, so we 
employ imcrop to give us a set of different random 
numbers. 
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imrandom -d milkdrop -h 512 >master 

imcrop master -y 0 -h 128 >ml 

imcrop master -y 128 -h 128 >m2 

imcrop master -y 256 -h 128 >m3 

imcrop master -y 384 -h 128 >m4 

lmaop ml m2 -op aver >tl 

lmaop m3 m4 -op aver >t2 

lmaop tl t2 -op aver >gauss 

lmaop milkdrop gauss -op sub I imtomask >out 

Experiment #7 

The brevity of most command lines is pleasing, 
but imtomask is ever-present. Because it is a unary 
operator immediately following imaop in each case, we 
extend imaop to include a thresh function. The 
code integration is trivial: three additional lines and a 
new switch statement label. As a bonus, the 
thresholding works “automatically” for colour files — a 
feature unanticipated at the outset. Thus, imhalftone 
has now been made obsolete. 

imaop milkdrop gauss -op thresh >out 

Conclusions 

The evolution of the imaging software 
demonstrates a few important principles. For one, 
generality of function allowed v a means to verify 
hypotheses, without any programs having to be written. 
As a clearer understanding of the desired goal 
emerged, specific tools were crafted. For instance, we 
created random numbers with Gaussian distribution by 
a simple synthesis of operation, thus allowing the user a 
glimpse into their properties. Should this become a 
desirable feature to support in general, -gauss or 
-seed switches may be added to Imrandom, but for 
current applications this is unnecessary. 

These examples show that when a new operation 
is sought, it can often be melded into the function of a 
tool in existence, thus widening the scope of operation 
for the original tool. This creates a tremendous synergy 
of function. By studying why more than one path 
toward a goal exists, we can both pare down what 
constitutes a minimal set, and simultaneously get new 
insights into the deep structure of the problem. 
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Abstract 

The development of Adagio, a robotics simulation 
workstation, has involved the implementation of several 
techniques unique to the system. Based on the 
message-passing, multitasking multiprocessor realtime 
operating system Harmony, Adagio is programmed using a 
large number of cooperating tasks. Several techniques are 
based on the concept of a server, a task that is alone 
responsible for governing a scarce resource. The Graphics 
Server task, the Data Structure Server task, and the Tracker 
Server task are responsible for the management of the frame 
buffer, the 3D geometric data structure and the screen 
tracker, respectively. Each of these servers is then a 
separate tool necessary for the implementation of the whole 
system. Each runs in parallel with other tasks and can handle 
requests for service from any task. 

Resume 

Le developpement d'Adagio, station de travail dedie a la 
simulation de robots, a conduit a l'impldmentation de 
techniques originale. Basd sur le syst&me d'operation de 
temps reel, multi-tache, multi-processeur Harmony 
(lui-meme base sur le transfert de messages), Adagio a ete 
programme en utilisant un grand nombre de taches 
cooperants entre-elles. Plusieurs de ces techniques sont 
basees sur le concept de serveur (le serveur est la seule tache 
responsable d'une certaine ressource). Les taches "Graphics 
Server", "Data Structure Server", et "Tracker Server" 
gerent respectivement la memoire d'image, la base de 
donnees geometriques 3D, et le curseur d'ecran. Chacun de 
ces serveurs est done un outil separe, necessaire a 
l'implementation du systeme complet Chacun s'execute en 
parallele avec d'autres taches et peut gerer des demandes 
venant d'autres taches. 


Keywords: robot simulation, realtime, multitasking, 
multiprocessing, message passing, server task, frame buffer, 
windows, 3D geometric data, user interfaces, screen tracker. 
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Introduction 

Adagio is a robotics simulation workstation currently 
under development at the National Research Council. The 
workstation, when completed, will give the user the 
capability of creating and manipulating 3D objects in a robot 
environment, of specifying the robot task, and of viewing 
the results of a robot simulation. As such it will certainly be 
usable for many other applications that can make use of a 3D 
window-based near realtime display with extensive 
interaction capabilities. 

The use of the Harmony operating system, a multitasking 
multiprocessor realtime message-passing system, as a base, 
has led to different approaches to the software architecture 
of an interactive graphics system. This paper discusses 
several of these approaches. 

Three servers, a Graphics Server, a Data Structure 
Server, and a Tracker Server follow an idea common in 
multitasking systems; i.e. each is solely responsible for a 
specific scarce resource. The Graphics Server is charged 
with the maintenance of the frame buffer, while the Data 
Structure Server maintains the 3D geometric representation 
of the robot and its environment. The Tracker Server 
communicates with the Tablet Server to provide continuous 
tablet tracker echoes. It is particularly well suited for a 
multiwindow system and provides richer user feedback than 
is currently available on most systems. 


Adagio Overview 

Goals 

Adagio [Tann85b] is a workstation being developed to 
support research in intelligent robotics. It is intended to 
provide a simulation facility for studies in the off-line 
programming of sensor-based robots. The functional 
requirements of providing the user with a view of the 
current status of the robot in its environment with near 
realtime updating (i.e. 5-30 frames per second), and 
providing for rich interactive dialogues for experiments in 
interactive graphics-based robot programming, led to the 
use of a powerful frame buffer display (in our case, an 
Adage 3000 graphics system) with a window-based user 
interface. Special software for the Adage 3000 bitslice 
microprocessor has been written to support multiwindow 
near realtime line-drawing and polygon faceted 3D 
renderings of a single scene. 
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Another goal has been to take advantage of the 
multitasking inherent in the Harmony operating system to 
improve various aspects of interactive graphics systems. 
This has resulted in implementing a highly parallel base for 
user interaction [Tann85a], a switchboard input model 
[Tann86], and the various tools described in this paper. 

Harmony 

The Adagio system design is influenced by the 
architecture of Harmony, a multitasking realtime operating 
system with rapid inter-task message passing developed at 
the National Research Council of Canada [Gent85]. A few 
details of its properties are given here in order to aid in the 
appreciation of the multitask design presented in the paper. 

Programs written for a Harmony-based system tend to be 
implemented as a set of many small tasks. A task is often 
used as one would use a subroutine in a conventional 
operating system, except that an instance of the task must be 
explicitly created, and once it has been created it executes 
independently of, and in parallel with, the task that created 
it. Task primitives are relatively cheap, message 
communication takes little time, (a send-receive-reply 
sequence takes about one millisecond), and the creation and 
destruction of tasks are inexpensive. Tasks can be created 
and destroyed as needed, and often are quite small with short 
lifetimes. 

The communication and synchronization of tasks, used 
extensively throughout this workstation design, is based on 
the send-receive-reply mechanism provided by Harmony. A 
task may send information or a request for information to 
another task by issuing the _Send command, passing a 
variable-length message. If the recipient task is alive, the 
sending task then blocks until a reply arrives from the 
recipient or the recipient is destroyed. A task receives a 
message by issuing a _Receive command, which blocks until 
a message is received, or a non-blocking _Try_receive 
command. In either case, the ID of the sending task is 
returned to the recipient task (0 if no message was waiting in 
the case of _Try_receive) along with a copy of the message 
that was sent. The _Receive or _Try_receive command may 
specify that messages are to be received from a specific task, 
or from any task. A task that has received a message from 
another task may reply to the sending task with the _Reply 
command. The_Reply command unblocks the sending task 
and causes a variable-length message to be replied to it. A 
task need not reply to a sending task immediately, replies 
may be issued at any time and in any order necessary to 
achieve synchronization. 

The send-receive-reply paradigm also encourages the use 
of server tasks, based on the administrator concept [Gent81], 
to perform various duties for other client tasks, such as the 
managing of scarce resources. A typical server never sends 
messages, it only receives and replies to requests. It often 
will have one or more worker tasks doing the time 
consuming work so that the server can respond to requests as 
quickly as possible. Because most servers do not send 
messages, two servers that must communicate usually do so 
through a courier task created for the purpose. A courier 
alternates between sending a request for information to one 
task and sending the resulting information to the other. The 
Graphics Server, the Data Structure Server and the Tracker 
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Server described below are three examples of servers used 
in Adagio. 

Hardware Configuration 

The Harmony operating system is particularly suited for 
running on a multiprocessor. Such a machine, a Chorus 
multiprocessor, consists of three single board computers, 
using Motorola 68000 processors on a Multibus backplane. 
Providing the graphics processing for the workstation is an 
Adage/Ikonas 3000 graphics system, a powerful frame 
buffer display with a 32-bit bitslice processor, a single 
68000 also running Harmony, and an image memory 1024 
by 1024 by 24 bits (512 by 512 visible). 

Interaction Model 

Adagio's design is based on the concept of a switchboard 
[Tann86]. The Switchboard, shown in Figure 1, is a server 
that corresponds with a number of input device servers, 
through couriers, and a number of client tasks that make use 
of the values from these devices. The client tasks request 
input from the Switchboard, which in turn routes to these 
tasks input from devices to which they are connected. A 
flow of messages is thus established going back and forth 
between the producers and the consumers of input. 

Graphics Server 

The Graphics Server, shown in Figure 2, replaces the 
graphics subroutine support package traditionally used in a 
single-task graphics program. Running as an independent 
task with the role of managing the frame buffer and the 
Adage bitslice processor, this server handles three types of 
request messages - window manipulation messages, 2D 
graphics messages, and 3D graphics messages. 

Screen Windows 

Screen windows in Adagio are implemented in a manner 
different from those of many other window based systems 
[Tann86]. All windows are tightly coupled, assisting in the 
single job of creating and manipulating a data structure 
defining the robot, its environment, and the actions of the 
robot. A tiled window approach is used to simplify rapid 
screen updates. The system supports the display of two 
different types of windows, 2D and 3D. Each 2D window, 
used for displaying text and 2D symbolic graphics, has its 
own associated task responsible for all activities in the 



Figure 1. The Switchboard task. (Note 
that arrow-heads point away from the 
task making the request.) 
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Figure 2. The Graphics Server. 

window. All 3D windows, used for displaying different 
views of the robot and its environment, are controlled by a 
single task, the Data Structure Server, through the Display 
List Courier, because each window is a different 
representation of the same robot data structure. 

The Graphics Server is responsible for displaying the 
contents of both types of windows. As well, it must change 
the way in which the windows are displayed on the screen in 
response to window manipulation messages from the 
Window Manipulator task that request the creation and 
modification of the window structures maintained by the 
Graphics Server. 

2D Graphics 

The 2D graphics messages request text or symbolic 
output to be directed to the screen window associated with 
the requesting task. The Graphics Server is responsible for 
translating from the virtual coordinate system of the 
requestor to the screen coordinates of its window. If 
required, the messages to a window may be stored and then 
reinterpreted if a window is modified in size, or if a picture 
segment that potentially blocks part of the window is moved. 
This is different from many other tiled window systems that 
require the application running in the window to become 
involved with die redraw of a window that has been changed 
in size. 

3D Graphics 

The 3D graphics messages are in the form of commands 
for the Graphics Server to modify the display list. The 
display list is in turn interpreted by the Adage bitslice 
processor to render the 3D image into the frame buffer 
[L 0086 ]. While the microcode running in the bitslice 
processor is not strictly speaking a Harmony task, and shared 
memory is used for communication, rather than the 
send-receive-reply message passing primitives, it can still be 
viewed externally like any other independent Adagio task. 
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The Graphics Server acts as an agent task, as described in 
Plebon and Booth [Pleb82], for the task running in the 
bitslice processor, providing an interface to the other 
processor and managing the communication between and 
shared resources of the two processors. Because requests 
for the bitslice processor to update the display list are 
buffered by the Graphics Server until the bitslice is ready to 
begin another update, client tasks requesting the services of 
the Graphics Server do not remain blocked for long. 

Currently, the bitslice processor cannot interrupt the 
68000 to signal completion of a screen update, so a worker 
task, the Bitslice Notifier, is created by the Graphics Server 
with the sole responsibility of informing the Graphics 
Server when the task running in the bitslice processor has 
finished. It sleeps most of the time, occasionally waking to 
poll a flag and sending a notification message to the Graphics 
Server when the screen update is finished. The incoming 
requests to modify the display list that the Graphics Server 
had been buffering are then satisfied, the bitslice processor is 
released to update the display, and the Graphics Server 
replies to the Bitslice Notifier thus releasing it to run again. 

The display list supports multiple views of a single 
environment where these views may differ in their 
viewpoints as well as their display parameters. Modifying 
display parameters from one 3D window to another permits, 
for example, the posting of shaded images of the robot in one 
window and a simple stick figure of only the axes of rotation 
of each of the joints in another - both rendered from the 
same display list Messages, resulting from user actions or 
simulation of robot activity, can request the rotation or 
translation of robot links. These require only a change of 
the appropriate transformation matricies in the display list 
and a signal to the bitslice processor to re-render the image. 

Data Structure Server 

With the multitasking approach to the workstation, it is 
quite feasible that more than one task would wish to update 
or query the structure representing the 3D geometry of the 
robot and its environment. To prevent corruption of the 
data by concurrent accesses, the data structure is known only 
to a single task, the Data Structure Server, shown in Figure 
3. All requests for information from the data structure, for 
data structure updates, and for manipulation of 3D windows 
are fielded by the Data Structure Server. Any update that 
requires a modification of the screen image is forwarded by 
the Data Structure Server to the Graphics Server, through 
the Display List Courier to avoid blocking for a long time. 
When a major portion of the graphics data structure must be 
sent to the Graphics Server for the creation of the display 
list, a pointer to the structure is sent, thus making the 
structure temporarily known to another task. However, 
until the Display List Courier reports the completion of the 
screen update, the Data Structure Server will satisfy only 3D 
data information requests. Requests to change the data will 
be held until it is safe to do so. 

Tracker Server 

Management of user feedback is an important element in 
any interactive system. The user must often keep in touch 
with several activities simultaneously. He must know what 
actions are available to him and what the system is doing, and 
he needs reassurances that all is progressing as it should. 
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Figure 4. Some Adagio icons. 





The Server Task 


Figure 3. The Data Structure Server. 

One element of system feedback is the screen cursor or 
tracker . Always displayed at or near the user's centre of 
attention, a well designed tracker that changes in response to 
system activity is a powerful indicator of the state of the 
world. Tilbrook [Tilb76], [Baec80] shows that a tracker can 
provide much more information than simply the X, Y 
position of a locator device. Plebon and Booth [Pleb82, pp. 
26-28] provides additional information on the use of 
trackers as feedback and gives further references. 

The term tracker is used for describing the feedback 
showing the current X, Y position of a locator device 
(mouse, tablet, joystick, etc.), because it is simple, it 
correctly describes the activity (tracking a locator device), 
and it avoids ambiguity. Although cursor is the most widely 
used term for locator feedback, it also has been associated 
with the flashing bar, line or box found on most 
alpha-numeric terminals to prompt for text input, with the 
hand-held physical device used for positioning on a graphics 
tablet (puck), and even with the the cross-hair wires at the tip 
of the puck. 

The Locator Model 

The Tracker Server assumes a locator model such as the 
one provided by the Adagio Tablet Server [Tann85b]. In 
addition to X, Y coordinates, the Tablet Server returns a 
window , or ID of a predefined region on the tablet surface; 
and a status of the pointing device. The status is dependent 
on both the current state of the tablet and the state during the 
previous read operation, and may be one of: UP/UP, 
UP/NEAR, NEAR/NEAR, NEAR/DOWN, DOWN/DOWN, 
DOWN/NEAR, or NEAR/UP. 


Icons 

In an environment where many activities may be 
controlled by a single input device, in particular a window 
based system, the tracker must be able to provide feedback 
indicating the action being performed. Having the tracker 
displayed as one of several possible graphical icons , or 
pictorial symbols, can help accomplish this. These icons, 
drawn from a set of icons defined for the system, (not all of 
which are used for trackers), enrich the interaction because 
different ones may be used to indicate the window where the 
tracker currently resides, the state of the task for that 
window, and perhaps the button most recently pushed. 
Figure 4 shows some currently used icons. 


The Adagio Tracker Server, shown in Figure 5, offers 
advantages over some other approaches to tracker 
management. For example, the University of Waterloo 
Paint program [Pleb82] [Beac82] also uses a single tracking 
task, but it is a small worker task, created whenever 
interaction is permitted and destroyed when it is not. Paint 
provides little control over icons, they are compiled into the 
program. The worker task has only three kinds of trackers 
and relies on a few global variables for information about 
the trackers. 

The Tracker Server, taking advantage of the powerful 
Harmony server model, maintains the data structure of all 
icons in Adagio, allowing icons to be designated or added, to 
be removed, or to be modified. Any icon may be invoked or 
made known to the graphics hardware as the current tracker. 
The Tracker Server can bind an icon with a particular screen 
window, the status of a locator device and the button value of 
the locator (together called an icon bundle). 

Icon bundles allow different trackers to be used for 
different states of the system so that the tracker best reflects 
the user's current activity. Bundles can be stacked, so that 
old bundles can be remembered if a temporary action needs 
to use the same window, status and button information as a 
previous action. This temporary action could be, for 
example, the changing of the tracker icon to Tilbrook's 
Buddha or Macintosh's wristwatch indicating that the 
window is busy, or when, through some mode selection, a 
different tracker is required to indicate the new mode. 
Bundles subsequently can be removed to restore the previous 
bundle or to break the icon-window-status-button 
association. 



Figure 5. The Tracker Server. 


Graphics Interface ’86 Vision Interface '86 
















102 


The Tracker Server allows trackers to be positioned in 
two ways, either by bundle or by naming the icon. Also, the 
server allows the tracker to be turned on or off as needed. 
The Tracker Server is device independent, it relies on the 
device dependent Tracker Display worker task to position 
the tracker on the screen. 

Complete System 

Figure 6 shows many of the tasks discussed in this paper. 
(Currently one may have 40 to 50 tasks concurrently active.) 
It is obvious from this figure that rapid message passing is 
crucial if the user is to see the results of his actions reflected 
in the image on the screen in a reasonable amount of time. 
Harmony does indeed provide such a basis, and its existance 
has encouraged the experimental techniques developed in 
this project. 

Conclusion 

The paper has described a number of servers currently in 
use in Adagio, each responsible for a certain resource. The 
use of these servers offers an abstraction between the 
specific features of the resource and the users or clients of 
the resource. The Graphics Server offers device 
independence as do many available graphics subroutine 
packages. However the Data Structure Server extends the 
idea of independence into the realm of data structures. The 
Tracker Server hides from its clients the peculiarities of the 
particular hardware tracker. 


A second advantage of the server approach, coupled with 
the use of clients, couriers, and notifiers, is the degree to 
which it makes it simple and natural for people to structure 
and code programs for multiple tasks and multiple 
processors. Resulting programs exhibit a high degree of 
parallelism, making possible the efficient use of 
multiprocessors - a necessity considering the direction of 
hardware development 

A high degree of modularity also results from the use of 
servers. The server model encourages the building of tools 
that are almost completely self-contained. All details 
concerning a resource can be easily encapsulated in a single 
unit. The interface is usually as simple as sending one of a 
predefined set of messages to the server and expecting one of 
a small set of predefined replies in return. 

Multitasking models of programming for interactive 
systems are not only useful for reasons of computing 
efficiency, but provide a far more appropriate base for 
computer-human interaction. Contrary to the belief of 
many system designers [Kem84], a human is not a file to be 
read. The narrow "user is a file" belief has led to the 
traditional interactive dialogue where the human uses a 
specific device in reaction to the systems commands. A 
multitasking system, made more simple to program using 
tools such as servers, can easily give far more control to the 
user by providing him with a variety of tools waiting to 
serve him. 



Figure 6. A typical configuration of the complete system. Only major tasks are shown, 
worker tasks, for example, are omitted. Significant couriers are shown as small 

unlabelled ovals. 
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Abstract 

The class concept is one component of object-oriented pro¬ 
gramming systems which has proven useful in organizing 
complex software. In experimenting with the use of classes 
for geometric modeling applications, we have devised a class 
hierarchy that yields some conceptual order in the midst of 
diverse representations of shapes. Rather than searching for a 
uniform primitive representation, we accept the diversity and 
build a framework in which dissimilar models are combined 
in an orderly manner. 

KEYWORDS: geometric modeling, procedure models, 
object-oriented programming 


Introduction 

Geometric modeling systems can become extremely 
complex when design applications demand flexibility in the 
representation of shapes. A major challenge for programmers 
who create such systems is to preserve order in spite of this 
complexity. Trends in this direction include moves toward 
uniform data representation and very general mathematical 
representations of shapes. However, the advent of special¬ 
ized procedural modeling techniques is a step away from uni¬ 
formity which strains programmers’ abilities to cope with the 
diversity that it presents. We feel that using an object- 
oriented programming methodology helps to solve this prob¬ 
lem. 

In spite of the recent interest in object-oriented program¬ 
ming, we have seen only a few published examples of 3-D 
graphics systems built in an object-oriented environment 
[Hedelman, Lorenson]. For the most part, the examples that 
we have seen emphasize the message passing aspects of 
Smalltalk-like languages and do not pay much attention to 
the effective specification of classes. 

In this paper we describe the class hierarchy of an 
experimental modeling and display system which we have 
assembled in order to study the problems of constructing 
extremely complex geometric objects. We emphasize identi¬ 
fying common elements of geometric procedures which can 
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be shared among representations. Our guiding principle is 
that methods should be shared by as many classes as possible 
and that they should belong to classes as high up in the class 
hierarchy as possible. In a following section we describe our 
attempt to define a class hierarchy that meets this criterion. 


Diversity of Geometric Representations 

Geometric models used in computer graphics have 
become so diverse that they don’t seem to fit any rational 
scheme of classification. There has been a substantial 
amount of development of modelers that manipulate polygo¬ 
nal meshes, models that produce parametric surfaces or alge¬ 
braic surfaces, and unified modeling systems that handle all 
three representations. Some of these have gone so far as to 
devise a common representation to ease the difficulty of stor¬ 
ing and manipulating the diverse representations. 

The widespread use of procedure models [Newell] has 
complicated the issue even further since the procedures are 
often restricted to such narrow purposes as creating trees 
[Bloomenthal], terrain [Fournier], or grass [Reeves]. To be 
sure, there are general purpose procedures such as sweeping, 
and there are generalizations such as graftals [Smith] which 
provide a common framework for a variety of individual 
geometric procedures. 

The common simplification of reducing all complex 
shapes to polygonal approximations is no longer feasible 
when the number of primitive elements exceeds a few tens of 
thousands. Modification of such complex collections by 
users is nearly impossible. Common operations such as 
interference checking and display are confronted with mas¬ 
sive amounts of data in this case. 


Classes 

One of the more successful mechanisms for coping with 
the complexity of programming is the use of classes. Origi¬ 
nally a feature of the Simula language, classes are a central 
feature of Smalltalk [Goldberg]. Their value is more 
apparent when one considers that mature languages such as 
Lisp [Cannon] and C [Stroustrup] have been extended to 
include classes. 
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Classes are defined by a set of instance variable declara¬ 
tions and a collection of methods [Robson]. Objects are 
instances of a class. In a procedural geometric model, the 
model itself is an object. 

The object-oriented organization gives a programmer the 
benefits of encapsulation and inheritance. To us, the more 
important property is the inheritance of methods and instance 
variable declarations. This means that objects that differ only 
slightly can be cast as members of distinct subclasses of the 
same superclass. Shared methods and instance variable 
declarations belong to the superclass. This code sharing has 
the beneficial side effect of reducing the overall code size. 
More importantly, development effort is reduced since pro¬ 
grammers don’t spend time writing the same methods over 
and over. 


Evolution of the Intelligent Modeler 

The overall goal of our research is to create detailed 
geometric models from sparse non-geometric inputs. We 
share this goal with several similar projects [Feiner, Holyn- 
ski, Friedell]. Our original intent was to produce a more or 
less conventional interactive modeling program with an 
geometric knowledge base as its central element In opera¬ 
tion, the modeler would interpret guidelines provided by the 
user to invoke rules in the knowledge base which would in 
turn generate geometric primitives. 

This approach was abandoned when we recognized it as 
an immense, monolithic procedure model. We then settled 
on a divide and conquer approach to reduce individual com¬ 
ponents of the modeler to manageable proportions. This then 
raised the additional problem of coordinating the actions of a 
multitude of autonomous procedures. For recursive pro¬ 
cedures, such as subdivision, which maintain a geometric 
hierarchy as a product of their operation, we include methods 
by which mutually constrained objects agree on how each 
affects the other [Ambum]. So far we have no methods to 
satisfy mutual constraints for non-recursive procedure 
models. 

Reducing the size of individual intelligent procedure 
models does not diminish the overall complexity of the 
modeler. For this we need the class hierarchy described in 
the following section. 


Classes for a Modeling and Display Package 

We have built (and continue to expand) an experimental 
modeling and display package based largely on generic pro¬ 
cedure models. A description of the display system is 
included in this discussion because modeling operations play 
such an important role in its operation. Figure 1 shows the 
class hierarchy of the modeling portion of the package. 
While it may seem logical to organize the classes based on 
similar properties (i.e. one superclass for curved surfaces, 
another for polyhedra, etc.), our organization is based on 
common methods. 

Diverse geometric representations can lead to a broad, 
flat organization of classes. However, our subtree for pro- 
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Figure 1. Geometry class hierarchy. 


cedural geometry is five levels deep. This is achieved by 
factoring methods and placing the generic component as high 
in the tree as possible. Figure 2 is a condensed listing of 
four levels of one subtree (the Object class at the root of the 
entire class tree is omitted). At the lowest level is the Bezier 
subclass. Only methods specific to the Bezier subclass actu¬ 
ally belong to it. 

To illustrate how methods can effectively be shared, 
consider the class of subdividable surfaces (the lower three 
levels of figure 2). A typical display algorithm for this type 
of surface calls for recursive subdivision to a predetermined 
level of detail followed by the tiling of polygons formed 
from the mesh of points generated by the last subdivision 
stage (figure 3). The subdivision method belongs to the indi¬ 
vidual subclass. The tiling method, on die other hand, 
belongs to the class of subdividable surfaces instead c f th^ 
individual subclasses. Likewise the test for level of detail 
belongs to the superclass rather than its descendants. This 
method inheritance is possible because the result of subdivi¬ 
sion is a collection of vertices and neither the tiler nor the 
level of detail test care how the vertices were produced. 

The display classes outlined in figure 4 are quite 
different from ones proposed early in the project. The initial 
hierarchy had fewer levels and far more classes than the final 
design. For example, we originally proposed a separate 
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// The class of all geometric elements 
class Geometry 
( 


// expand: a generic subdivision method 
// class: SubGeometry 


// GEOMETRIC INFORMATION 

int numVerts; 
vertex **v; 

BOOL bBoxSet; // boolean for whether bBox Is set 

boundingBox bBox; // bounding box parameters 

matrix tform; // transformation matrix 

// SURFACE PROPERTIES 

properties *color; // surface color information 


SubGeometry. expand ( ) 

{ 

if (this->terminationTest(this->detail) 
this->tile() ; 

else 


) 


this->expandlt() ; 


DONE) 


// expandit: geometry specific subdivision 
// class: Bezier 


// RENDERING INFORMATION 

rendParam rendlnfo; // rendering parameters (shading) 
float detail; // level of detail 

// note: actual position information is defined by subclasses. 
// 'V should be set up to point to position information. 

// This allows common methods to be implemented more easily. 


Bezier.expandit() 

{ 

// split current patch into four 
// subpatches and expand each 
// one in turn 

new patchl ; patchl. £00 (this) ; 
patchi.expand() ; delete patchl ; 


public: 


// GEOMETRIC INFORMATION 

boundingBox getBBox(); // return bounding box info 
normal getNormal(); // return surface normal 


// TRANSFORM GEOMETRY 
void rotate(axis,float); 
void translate(float,float,float); 
void scale(float,float,float); 

void transform(matrix); // transform according to supplied mati 


void tile(); 
void expand(); 

void collapse(); 

void textDumpCeometry(); 
void graphDumpGeomotry(); 
}; 


// generate rendering primitives 

// expand geometry if possible 
// for example: subdivide 

// reduce geometric complexity 

// show geometric info in various forms 


// The class of procedurally modeled geometry 
class ProcedureGeometry: public Geometry 
{ 

}; 


// The class of all subdividable procedure models 
class SubGeometry: public ProcedureGeometry 
{ 

int level; // number of times subdivided 

public: 

void subdivide(float); 

BOOL terminationTest(float); // level of detail test 

void changeBasis(int); 
void tile(); 

>; 


// The class of subdividable bicubic Bezier patches 
class Bezier: public SubGeometry 
{ 


Bezier ^parent; 

// 

vertex cpoints[4][4]; 

// 

float umin; 

// 

float umax; 


float vmin; 

// 

float vmax; 


Bezier *children(4J; 

// 


public: 

void subdivide(float); // 

}; 


pointer to parent of this patch 
the control points 

u parameter range in original patch 
v parameter range in original patch 
pointers to subpatches of this patch 
redefine superclass methods 


new patch2 ; patch2. fOl (this) ; 
patch2.expand() ; delete patch2 ; 

new patch3 ; patch3.f10(this) ; 
patch3.expand() ; delete patch3 ; 

new patch4 ; patch4. fll (this) ; 
patch4.expand() ; delete patch4 ; 


Figure 3. A subdivision procedure divided into a generic 
method and a subclass specific method. 
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display class for BSP trees [Fuchs]. However, the BSP tree 
display algorithm is really a sorting method for geometric 
objects followed by a tiling method which deposits polygons 
in an image buffer. After factoring the algorithm into its 
constituent elements, and defining each element as a method 
of the most appropriate class, we can easily use the BSP tree 
sorting method in ray tracing to reduce the number of ray- 
surface intersection tests. 

There are a number of commonly used algorithms, such 
as ray tracing, which are being ripped apart and reconstituted 
as methods belonging to various geometry and display 
classes. In general, as we recognize the common elements in 
different classes, the class tree becomes deeper and narrower. 


Conclusion 

This paper is not a sermon on the virtues of object- 
oriented programming. It is concerned instead with its appli¬ 
cation to modeling and display problems. 

Organizing modeling representations into a class hierar¬ 
chy has drastically altered the way we view geometric 
models. In particular, we have factored methods into generic 
and specific parts so that the generic parts can be shared. 
We have also found ourselves looking for ways to apply 
those methods that we have on hand to a broader range of 
geometric problems. We feel that our approach not only 
makes the programming of the modeling and display package 
more manageable, but it expands the range of features that 
the package can support 
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APPLICATIONS OF WORLD PROJECTIONS 
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Abstract 

Various techniques have been developed which employ 
projections of the world as seen from a particular 
viewpoint. [Blinn and Newell] introduced reflection map¬ 
ping for simulating mirror reflections on curved surfaces 
and their method can be extended to simulate refraction. 
[Miller and Hoffman] have presented a general illumina¬ 
tion model based on world projections. [Greene] has 
used projections of the world to model distant objects, 
and [Greene and Heckbert] have used world projections 
to produce pictures with the fisheye distortion required 
for Omnimax^ frames. World projections can also be 
used as a backdrop for ray tracing or beam tracing. 

This paper proposes a uniform framework for represent¬ 
ing and utilizing world projections and argues that the 
best general purpose representation is the projection 
onto a cube. Surface shading and texture filtering issues 
related to environment mapping are discussed including 
approximate methods for obtaining diffuse and specular 
shading values from prefiltered environment maps. It is 
noted that obtaining accurate diffuse reflection and 
antialiasing specular reflection, which are both prob¬ 
lematical with ray tracing, can be effectively handled by 
environment mapping. 


Keywords: Environment mapping, reflection mapping, 
surface shading, texture mapping, cube projection, Mer¬ 
cator projection. 


Omnimax is a registered trademark of Imax Corporation, 
Toronto, Canada. 
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1. INTRODUCTION 

Reflection mapping, introduced by Blinn and Newell in 
1076, is a shading technique that uses a projection of the 
world (a “reflection map”) as seen from a particular 
viewpoint (the “world center”) to make rendered sur¬ 
faces appear to be reflecting their environment. The 
mirror reflection of the environment at a surface point is 
taken to be the point in the world projection correspond¬ 
ing to the direction of a ray from the eye as reflected by 
the surface. Consequently, reflections are geometrically 
accurate only if the surface point is at the world center 
or if the reflected object is greatly distant. The 
geometric distortion of reflections increases as the dis¬ 
tance from the surface point to the world center 
increases and as the distance from the reflected object to 
the world center decreases. To apply reflection mapping 
to a particular object, the most satisfactory results are 
usually obtained by centering the world projection at the 
object center. 

This method for approximating reflections can be 
extended to encompass refraction. Obtaining accurate 
results, however, requires much more computation since 
the ray from the eye should be “ray traced” through the 
refractive object, and in this process the ray usually 
splits into reflected and refracted components at surface 
intersections. As with reflections, results are only 
approximate for geometric reasons. (Note: Simply bend¬ 
ing the ray at the surface point and using this as the 
direction of the refracted ray is not accurate, but may 
convey the impression of refraction.) 

As Miller and Hoffman have described, the concept of 
reflection mapping may be thought of in more general 
terms as an illumination model. Essentially, they treat a 
world projection as an area light source which produces 
sharp reflections in smooth glossy objects and diffuse 
reflections in low gloss objects. This is a good model of 
illumination in the real world, although shadows are not 
explicitly handled and, as with reflection mapping, 
results are only approximate for geometric reasons. In 
order to speak generically of this approach and conven¬ 
tional reflection mapping, the term “environment map¬ 
ping” will be applied in this paper to techniques for 
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shading and texturing surfaces which employ a world 
projection. 

2. STRENGTHS OF ENVIRONMENT MAPPING 

Environment mapping is often thought of as a poor 
man's ray tracing, a way of obtaining approximate 
reflection effects at a fraction of the computational 
expense. While ray tracing is unquestionably the more 
versatile and comprehensive technique, handling shadows 
and multiple levels of reflection and refraction [Whitted], 
it is interesting to note that environment mapping is 
superior in some ways, quite aside from the enormous 
advantage in speed. Obtaining accurate diffuse reflection 
and antialiasing specular reflection are both problemati¬ 
cal with ray tracing since it point samples the three 
dimensional environment (Amanatide’s approach of ray 
tracing with cones is an exception). Environment map¬ 
ping can effectively handle these problems by filtering 
regions of the world projection. Under many cir¬ 
cumstances, for example when a large area light source 
illuminates a low gloss object, the subjective quality of 
reality cues produced by environment mapping are supe¬ 
rior to those produced by unadorned ray tracing. While 
it is noted that refinements to ray tracing proposed by 
[Cook-Porter-Carpenter] and [Amanatides] address the 
problems of aliasing and diffuse reflection, their methods 
increase several-fold the already extreme computational 
cost. (Incidentally, there is an interesting parallel 
between the way these refinements work and the use of 
texture filtering in environment mapping: both attempt 
to integrate over a region of the environment.) It should 
also be mentioned that ray tracing and environment 
mapping can be used in combination where foreground 
objects are ray traced and a world projection represent¬ 
ing distant objects serves as a backdrop (Lucasfilm’s 
“1984” serves as an example [Cook-Porter-Carpenter, 
Figure 8 ]). 

3. AN EXAMPLE 

The dimetrodon lizard of Plate 1 was rendered with 
reflection mapping (i.e. environment mapping with mir¬ 
ror reflections). Inspection of this image reveals an 
inherent limitation with reflection mapping: since the 
reflecting object is not normally in the world projection 
the reflecting object cannot reflect parts of itself, e.g. the 
legs are not reflected in the body. (Actually, this limita¬ 
tion can be partially overcome by using different world 
projections for different parts of an object.) But overall, 
reflection mapping performs well in this scene. The hor¬ 
izon and sky, which are the most prominent reflected 
features, are accurately reflected due to their distance 
from the reflecting object. The reflections of the tree 
and the foreground terrain are less accurate because of 
their close proximity, but surface curvature makes this 
difficult to recognize. As this example shows, reflections 
don't need to be accurate to look realistic, although 
attention should be paid to scene composition. Planar 
reflecting surfaces may cause problems, for example, 
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since the distortion of reflections in them may be quite 
noticeable. 

4. RENDERING A CUBE PROJECTION 

Use of environment mapping presupposes the ability to 
obtain a projection of the complete environment. Sup¬ 
pose that we wish to obtain a world projection of a three 
dimensional synthetic environment as seen from a partic¬ 
ular viewpoint. One method is to position the camera at 
the viewpoint and project the scene onto a cube by 
rendering six perspective views, each with a 90 degree 
view angle and each looking down an axis of a coordi¬ 
nate system with its origin at the viewpoint. The world 
projection used to shade the lizard in Plate 1 was 
created in this manner and the resulting cube is shown 
unfolded in Plate 2 . This method of creating world pro¬ 
jections has also been used by Miller and Hoffman as an 
intermediate step in creating a Mercator projection of 
the world, the format they prefer for environment map¬ 
ping. Blinn and Newell also use a Mercator projection. 

5. SURFACE SHADING AND TEXTURE FILTERING 

Light reflected by a surface is assumed to have a diffuse 
component and a specular component. The diffuse com¬ 
ponent represents light that is scattered equally in all 
directions, the specular component represents light that 
is reflected at or near the mirror direction. This discus¬ 
sion of surface shading will be confined to obtaining 
diffuse and specular illumination from a world projec¬ 
tion; the problem of combining this information along 
with surface properties (color, glossiness, etc.) to obtain 
the color reflected at a surface point will not be con¬ 
sidered. See [Cook and Torrance] for a general discus¬ 
sion of surface shading, [Miller and Hoffman] for a dis¬ 
cussion of shading in the context of environment map¬ 
ping. 

Diffuse illumination at a surface point comes from the 
hemisphere of “sky” in the world projection centered on 
the surface normal, and it can be found by filtering the 
region of the world projection corresponding to this hem¬ 
isphere. Filtering should be done according to Lambert's 
law which states that the illumination coming from a 
point on the hemisphere should be weighted by the 
cosine of the angle between the direction of that point 
and the surface normal. 

A region of the world projection should also be filtered 
to obtain the specular illumination component. Figure 1 
shows the cone of “sky” reflected by a surface which 
corresponds to a pixel in the output image. An obvious 
approach to determining the specular illumination at a 
pixel is to find the region of the world projection sub¬ 
tended by the corresponding “reflection cone” and then 
average the pixels in this region. While this approach 
will produce reasonable results, they will not be optimal 
for several reasons. One problem is that the size of the 
region filtered should be influenced by surface roughness 
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Figure 1* Rays from viewpoint through 
corners of a screen pixel are reflected by a sur¬ 
face, defining a reflection cone. 


at the pixel level, since rough surfaces scatter specular 
illumination (thus, a larger area should be filtered if the 
surface is rough). Secondly, for theoretical reasons filter¬ 
ing should not be restricted to the region of texture 
corresponding to pixel bounds, and averaging of pixel 
values should be weighted by proximity to the center of 
the region being filtered. Subsequent references to 
reflection cones are made bearing in mind that they indi¬ 
cate only the approximate boundaries of regions to be 
filtered. See [Heckbert) for a general discussion of tex¬ 
ture filtering. 

While the details of shading formulas are beyond the 
scope of this paper, a rough rule for shading mono¬ 
chrome surfaces at a surface point is: 

reflected color = dc * D 4* sc * S , where 

dc is the coefficient of diffuse reflection 
D is the diffuse illumination 
sc is the coefficient of specular reflection 
S is the specular illumination 

(Note: dc and sc depend mainly on surface glossiness) 


Plate 4 shows three monochrome spheres reflecting the 
environment of Plate 2. The relative weighting of the 
diffuse and specular components varies from completely 
diffuse- on the left (a Lambertian reflector) to completely 
specular on the right (a perfect mirror). 

6. CUBE PROJECTIONS VS. MERCATOR PROJEC¬ 
TIONS 

When world projections are rendered from three dimen¬ 
sional models (rather than being photographed or 
painted), the cube representation of the world is pre¬ 
ferred to a Mercator projection for reasons of computa¬ 
tional efficiency and image quality. Rendering the cube 
projection is normally required in both cases. The 
further step of creating a Mercator projection from the 
cube projection requires additional computation, and the 
added generation of filtering can only degrade image 
clarity. 

Moreover, Mercator projections are non-linear which 
complicates texture filtering since the region of texture 
subtended by a reflection cone does not have a polygonal 
boundary. (With cube projections, reflection cones 
always subtend polygonal regions of cube faces.) Figure 
2 shows the regions subtended by the same reflection 
cone in Mercator and cube projections. Accurate filter¬ 
ing of the subtended region in the Mercator projection 
(the upper region) is problematical due to its irregular 
shape and the fact that pixels in the projection 
correspond to widely varying areas of “sky.” Admittedly, 
filtering a cube projection presents a different problem: 
the multiplicity of regions to be filtered, five in the 
example of Figure 2. Fortunately, these problems occur 
only when surface curvature is high at the pixel level, 
which is not usually the case. Low surface curvature 
produces narrow reflection cones which map to approxi¬ 
mately quadrilateral areas in a Mercator projection and 
which usually map to a quadrilateral on a single face of a 
cube projection. 



(right). The reflection cone covers about 3/8 of the 
world. 
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7. PREFILTERED WORLD PROJECTIONS 

As noted above, the width of a reflection cone depends 
on surface curvature so the area of “sky” reflected at a 
pixel can be arbitrarily large. A sphere covering a single 
pixel, for example, reflects nearly the entire world. 
Thus, environment mapping normally requires filtering 
large areas of the world projection at some pixels, even 
to produce reflections in mirror surfaces, so approximate 
filtering methods utilizing prefiltered texture greatly 
enhance efficiency. 

As Miller and Hoffman describe, a diffuse illumination 
map can be created by convolving the world projection 
with a Lambert’s law cosine function, a kernel covering 
one hemisphere of the world. This map, which is 
indexed by surface normal, may be thought of as indicat- 
'Vijfg by a monochrome spherical 

Lambertian reflector placed at the world center. Since 
the diffuse illumination map has little high frequency 
content it may be computed and stored at low resolution 
and accessed with bilinear interpolation. Thus, 
prefiltering the world projection in this manner reduces 
the problem of finding the diffuse illumination at a sur¬ 
face point to a table lookup. Plate 3 is a magnified 
diffuse illumination map in Mercator projection format 
corresponding to the world projection of Plate 2. 
Incidentally, since diffuse illumination maps usually 
change only subtly from frame to frame, for animation 
applications it may suffice to create these maps at inter¬ 
vals (say every tenth frame) and interpolate between 
them. 

Specular illumination can be obtained using fast, approx¬ 
imate filtering techniques developed for texture mapping 
surfaces. Our present implementation of environment 
mapping uses a prefiltered cube projection in the form of 
six “mip maps” [Williams], one for each cube face. Mip 
mapping is fast, but can only filter square regions of tex¬ 
ture, so results are only approximate (in the example of 
Figure 2, the polygonal areas on the cube faces would 
need to be approximated by squares). [Crow] and [Per- 
iin] have proposed similar approximate filtering tech¬ 
niques. 

8. CHEAP CHROME EFFECTS 

Environment mapping also has wide application where 
the objective is to produce a striking visual effect 
without particular regard for realism. Often the intent 
is to give objects a chrome plated look and the content 
of reflections is unimportant. In Robert Abel and 
Associate’s “Sexy Robot” animation, for example, the 
reflection map was a smooth color gradation from earth 
colors at low elevations to sky colors at high elevations, 
and at a given elevation color was constant [Bylesj. For 
applications of this sort, a colormap (or other one dimen¬ 
sional color table) suffices to specify the color gradation, 
and rendering computation can be greatly reduced by 
filtering this table instead of performing two dimensional 
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texture filtering. Since an environment map isn’t used, 
memory requirements are minimal and no setup time is 
required to prefilter texture. Highlights can be produced 
by adding point light sources independently of environ¬ 
ment mapping. 

0. MODELING BACKGROUND OBJECTS WITH 
WORLD PROJECTIONS 

While the discussion thus far has been confined to using 
world projections for surface shading, they can also be 
used to model background objects. As described in 
[Greene], the sky component in frames of moving camera 
animation can be modeled as a half-world projection, for 
example the upper half of a cube. As the camera moves 
through the scene the appropriate region of the projec¬ 
tion comes into view. Of course the advantage of this 
technique is speed: a scene element (in this case the sky) 
can be rendered from the world projection with texture 
mapping which, for complex scenes, is much faster than 
rendering from corresponding three dimensional models. 
This method assumes that objects in the sky are greatfy 
distant from the camera, and results are only approxi¬ 
mate when this is not the case. Plate 5 is a frame from 
animation in which the sky component was rendered 
from texture painted on a half-world projection. 

Since half-world models cover the whole sky, they are 
particularly useful for creating world projections for 
environment mapping. The sky component of Plate 2 
was modeled by projecting a 180 degree fisheye photo¬ 
graph of sky onto a half cube, shown in isolation in Plate 
6 . This model served two purposes in producing the pic¬ 
ture of Plate 1, modeling the sky in the background and 
making the world projection which was used to shade 
the lizard. Incidentally, using photographic texture for 
sky models is an attractive option due to the difficulty of 
synthesizing complex sky scenes with realistic cloud 
forms and lighting effects, and using a 180 degree fisheye 
lens allows the whole sky to be photographed at once, 
thus avoiding problems associated with photo mosaics. 

In scenes of animation where the camera rotates but 
doesn’t change location, this approach to modeling sky 
can be extended to modeling the whole background. In 
this case, a single world projection centered at the cam¬ 
era is generated and the background can be rendered 
directly from this projection for the frames of animation: 
The geometry of the scene is faithfully reproduced 
regardless of the distance of objects in the scene from 
the camera; results are not just approximate. There are 
also limited applications of this technique to moving 
camera animation. A world projection of the distant 
environment centered at a typical camera position can 
be used to render distant objects in the scene while the 
near environment is rendered from models at each frame 
and then composited with the distant environment. This 
approach is analogous to using foreground and back¬ 
ground levels in cel animation. As before, it is con¬ 
venient to represent the world as a cube projection since 
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it can be rendered directly from three dimensional 
models. 

Rendering background objects from a world projection is 
a form of texture mapping since each pixel in the output 
image is rendered by determining the corresponding 
region in the world projection and then filtering a neigh¬ 
borhood of this region. Assuming that the world is 
represented as a cube projection, the cube faces should 
have substantially higher resolution than the output 
frames since only a small part of the world is subtended 
by the viewing pyramid for typical view angles. For 
example, a viewing pyramid with a 3:4 aspect ratio and 
a 45 degree vertical view angle subtends only about six 
percent of the world. Prefiltering the world projection is 
unnecessary since only small regions of texture are 
filtered. 

This approach to rendering background objects suggests 
a method for performing motion blur. The region of tex¬ 
ture traversed by the projection of an output pixel in the 
time interval between frames is determined, and then 
this region is filtered. The simplicity of performing accu¬ 
rate motion blur in this situation is due to the two 
dimensional nature of the model. The filter employed 
should be spatially variant because different output pix¬ 
els may traverse differently shaped paths (this occurs, for 
example, if the camera rolls). Approximate results can 
be obtained by filtering elliptical regions in the world 
projection. Greene and Heckbert present an efficient 
method of filtering arbitrarily oriented elliptical areas. 

10. NON-LINEAR PROJECTIONS 

In addition to using world projections to generate per¬ 
spective views of an environment, they can also be used 
to create non-linear projections such as the fisheye pro¬ 
jection required for Omnimax frames. (The screen in an 
Omnimax theater is hemispherical and film frames are 
projected through a fisheye lens [Max].) Greene and 
Heckbert have obtained Omnimax projections of three 
dimensional scenes by projecting the scene onto a cube 
centered at the camera at each frame and then filtering 
regions of the cube faces to obtain pixels in the output 
image. This technique is very similar to the method 
described in the preceding section for rendering back¬ 
ground objects from world projections. Plate 7 is an 
Omnimax projection made from the cube projection of 
Plate 2. 

11. CONCLUSION 

The projection of the environment onto a cube is a con¬ 
venient and efficient format for world projections for the 
applications cited in this paper. In the context of a 
graphics system which employs world projections for 
multiple applications, the advantages of using a standard 
format are obvious: projection and filtering software is 
simplified, multiplicity of picture formats is avoided, etc. 
Generally, the methods described are ways of approxi- 
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mating a three dimensional problem with a two dimen¬ 
sional problem in order to reduce computational expense: 
environment mapping approximates ray tracing, render¬ 
ing background objects from world projections with tex¬ 
ture mapping approximates image rendering from three 
dimensional models. The subjective quality of reality 
cues in images produced with these approximate 
methods often compares favorably to results obtained by 
more expensive image generation techniques, and for 
complex environments approximate techniques may be 
the only practical way of producing animation having 
the desired features with moderate computing resources. 
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Plate 1. Dimetrodon lizard model reflecting its environment. 
(Lizard modeled by Dick Lundin, tree modeled by Jule3 
Bloomenthal) 



Plate 5. Frame from animation utilizing a half-world sky 
model with painted texture. (Note: The reflection in the lake 
was not produced with reflection mapping.) 



Plate 8. Half-world sky model made with photographic 
texture. 




Plate 3. Diffuse illumination map of the environ¬ 
ment of Plate 2 (Mercator format). 



Plate 4. Spheres reflecting the environment of Plate 2. Lcfthand 
sphere is a Lambertian reflector, righthand sphere is a perfect mirror. 



Plate 7. Omnimax projection of the environment of Plate 2. 
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ABSTRACT 

The overall goal of our work is human motion understanding. In 
particular, motion performance, observation, description, and 
notation impacts the form of a motion representation. A 
representation can be verified by a computer graphics performance, 
and thus the effective control of natural-appearing human figure 
movement is a significant and challenging goal. Characteristics of a 
computationally realizable human movement representation are 
discussed, including distinctions between hierarchic levels, 
kinematics, and dynamics. The qualitative factors of Effort-Shape 
notation are used to suggest extensions to existing movement 
representations in directions consistent with known characteristics of 
human movement and conventional animation. We show how useful 
and expressive motion qualities may be at least approximated by a 
combination of kinematics and dynamics computations, with kinetic 
control modulated by acceleration and decelerations derived from 
existing interpolation methods. Interactions between motions by 
phrasing, temporal properties, or relationships may be described and 
executed within an appropriately detailed model. 

RESUME 

L’objet de notre 6tude est la comprehension du mouvement humain. 
Plus particulibrement, le fonctionnement du mouvement, son 
observation, sa description et sa notation ont un impact sur 
1’organisation de la representation du mouvement Celle-ci peut dtre 
controll&e k l’aide de l’informatique graphique, mais un control 
addquat du naturel de l’apparance du mouvement du corps humain 
est un defi k relever. Differentes caractdristiques de la realisation du 
mouvement par informatique sont examinees. On distingue 
notamment la kin£matique, la dynamique et les niveaux 
hyerarchiques. Les facteurs qualitatifs d’une notation "Effort-Shape" 
precise sont utilises pour dvoquer l’extension de la representation 
actuelle du mouvement vers une une direction compatible avec les 
caracteristiques courantes du mouvement et de V animation. Nous 
demontrons comment certaines qualitds significatives du mouvement 
peuvent dtre approximees par la dynamique et la kinematique avec le 
control de la cindtique modulde par l’accdldration et la ddcdldration 
ces deux demidres dtant ddrivdes par les mdthodes d’interpolation 
conventionnelles. L’interaction entre 1’expression du mouvement et 
les propritdes temporelles peuvent dtre ddcrites et executees selon les 
limites d’un moddle pertinnement detaille. 

KEYWORDS: Human movement, motion understanding, movement 
representation, computer animation, simulation, computer graphics, 
dynamics. 
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EXTENDED ABSTRACT 

INTRODUCTION 

A significant portion of our activities and perceptions are 
associated with the performance, observation, description, or 
recording of human movement It is a challenge to the current state 
of knowledge in Computer Science to similarly represent, simulate, 
and integrate these differing manifestations of human movement 
since they touch on such seemingly diverse areas as computer 
graphics, computer vision, robotics, and computational 
linguistics [6]. In this exposition we shall discuss the philosophy and 
methodology behind our research into the computational 
understanding of human movement, concentrating on the issues of 
movement representation, movement synthesis, and task 
specification. While our primary emphasis will be on performance, 
that is, the animation or simulation of natural human motion, we 
cannot avoid inquiring what our representational decisions would 
imply for a general theory of human movement understanding. 

We will try to examine human movement in the most global 
view possible, namely, that a movement representation should be at 
least sympathetic to the needs and character of each modality: 
performance (or control), observation, language description, or 
symbolic recording. Our own research, and certainly that of others, 
has touched all these areas: for example, computer graphics for 
human motion synthesis [9,16,65,33,41,38, 21], computer vision 
for motion and shape analysis [46,1,36], movement notations for 
symbolic motion representation [29, 63,9, 15], language analysis for 
motion verb characterization [45,4,23], and robotics for path 
planning and goal-directed behavior [35,34]. Having originally 
examined motion descriptions based on visually-observable data [4], 
the inadequacy of this view by itself is keenly felt Such descriptions 
may serve as a target for information reduction, but are apt to be the 
product of convenience dictated by the observational task at hand. 
Such a description differentiates between phenomena of interest, 
possibly incorporating rudimentary notions of direction, velocity, 
and shape. Likewise, representations derived solely from 
language [56] omit essential information needed to reconstruct an 
acceptable performance. 

By turning to representations derived from graphical 
performance or physical object control (for example, robotics), we 
get a different perspective. In particular, representations based on 
these end products will have the property that a graphical or physical 
performance will verify that a representation is adequate to 
characterize some (hopefully broad) class of human movement . It is 
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this adequacy that permits experimentation based on empirical data 
(say from observed movements) and parametric variation to control 
or tune the result 

The role of natural language descriptions is to expose the 
salient features of human motion interpreted (by a culture) as 
significant events. In particular, we rind language has evolved rich 
verb and adverbial vocabularies to permit the description and 
expression of subtle movements. In fact, language goes even further 
by imputing behavior, emotion, and intent to movement, even when 
that motion is not obviously attributable to human-appearing 
agents [44]. While such information is available subconsciously via 
our cognitive systems, it may also be instantiated in language (or 
physically acted out, for that matter). Therefore we assume the 
existance of a transformation which maps some of these 
subconscious perceptions into tangible (and essential) components of 
a motion representation. It appears that some of this information can 
be captured; how much is not clear, though we will propose a model 
here. 

Finally, we use movement notations as a source of symbolic 
representations derived from empirical observation and analysis over 
many years by many observers of numerous subjects. The impact of 
such systems is that they provide one of the only possible bases for 
establishing completeness : that is, does the representation cover, in 
its variations, the known scope and range of human movement? 
Language also provides some of this scope, but does not lend itself 
so readily to analysis. 

We proceed by examining some of the representational issues 
which arise in considering the influence of these requirements. 

REPRESENTATION REQUIREMENTS 

Movements of human or robot agents may be characterized at 
many different levels. A purely geometric level of description as 
changing coordinates, though necessary, is insufficient as a 
comprehensive basis for understanding motion. A simple gesture 
such as closing the hand may be described by joint angles, by paths 
of the fingertips, by flexion of muscles, by the concept "grasp," or by 
the intention "shake hands." Each type of description is useriil in 
different contexts, and a natural hierarchy of levels seems to appear. 
To discuss a movement representation therefore is to establish what 
descriptive levels are important and what attributes or characteristics 
are adequate to completely "cover" the space of possible movements 
at each level. We will return to this issue later, after establishing a 
plausible representation scheme in which to formulate higher level 
motion or action descriptions [22]. 

Viewing movement hierarchically focuses attention on 
descriptive or conceptual levels, that is, the refinement or 
generalization of a movement at a different level of detail. 
Performance of a particular motion, however, requires the interaction 
or combination of effects from many sources. While geometric 
object descriptions lend themselves to a hierarchic view [18, 8,42], 
motions are dictated by simultaneous interacting influences. Muscle 
tension, external forces, joint limits, path constraints, expressive 
purpose, intention, and the context of temporally adjacent activities 
all affect human movement. A more general approach to movement 
understanding therefore would cover at least the following aspects of 
a motion. 

• The geometry, kinematics, and dynamics of the agent. 

The individual differences in people and their anthropometiy 


must be taken into account Motion is significantly affected by 
the kinematics of jointed objects, such as joint limits, 
reachable points, and comfort zones. Dynamics describes the 
force or effort influencing motion, whether actual or 
perceived, and may be independent of motion path. Dynamics 
also involves the inherent strength of the agent to initiate or 
resist motion. 

• Any goal-directed or intentional acts the movement was part 
of. 

Much human motion is intentional, even if unconscious: the 
achievement of reach goals, negotiation through a space, 
maintenance of balance, and comfortable distribution of 
weight 

• The agent’s attitude toward the environment, and its general 
mode of behavior. 

The interpretation of any particular motion is highly dependent 
on the environmental and personal context: thus a 
"threatening" gesture in a social context may be merely 
"defensive" in an athletic one. Motions which are part of an 
ongoing task or activity may be perceived as more global 
entities rather than isolated movements. 

• What, if anything, it signified. 

For example, sign language research [31] shows ti_t certain 

seeming variations in a movement are understood as die same 
sign, while others are not. Often movements along the same 
spatial path and toward the same spatial goal may signify very 
different intents, such as "touch, "press" and "punch." 

• Any synchronization or concurrency relations the movement 
depends on or is derived from. 

Motions may occur in isolation, in sequence, in parallel, or in 
any overlapped or superimposed combination. Some of these 
relationships were studied in the motion context [10], in 
language [2,62], and in task-level reasoning [61,22]. They 
may also overlap, mask, dominate, accentuate, or modify one 
another, as has been demonstrated with facial 
motions [52,53]. The movements may occur compressed or 
extended in time, or be subject to environmental constraints or 
control requirements. For example, the actions of a group of 
athletes on a team is subject to the rules of the game as much 
as the particular instantaneous circumstances of the play. 

Of course, these factors are not orthogonal to one another, but 
interact and interrelate in complex ways. Part of our task is therefore 
to organize motion information so that we can hope to control 
motion to the extent that the different factors can be investigated at 
appropriate levels. 

The central "core" of the movement understanding 
methodology is a movement representation and its interpretation by 
computer simulation. The reason we insist upon interpretation will 
be clarified further in the next section. In succeeding sections we 
will examine particular aspects of the motion representation and 
show how each component is essential to effective motion synthesis 
and how its semantics might be implemented. 

MOVEMENT REPRESENTATION 

In keeping with the general concerns expressed above, we 
enunciate several criteria deemed essential to the design of an 
effective motion representation. To focus the effort, we will define a 
movement representation as a system in which any movement may be 
decomposed into "primitives" with implementable semantics. We 
require these primitives to meet certain constraints: 
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• descriptive significance 

This issue implies that mere visual images are not sufficient 
for a motion representation; even an extensive "film library" is 
not in the form of primitives that may be readily used as the 
basis for simulating arbitrary motion patterns. There is no 
index upon which similarity or differences between two 
motions may be easily judged. There may not even be 
agreement between observers as to the name or type of motion 
being performed. The fact that most imagery is two- 
dimensional is an additional complication, but if the images 
were from multiple viewpoints or even holographic, the 
objection would still stand. 

A similar objection can be raised to descriptions consisting of 
natural language text. Though there may be cultural 
agreement on the meaning of an utterance, the actual process 
of converting the description to action may be subject to 
widely varying interpretations, for example, via "acting." 

• modifiability through generally accessible methods 

This issue implies that a motion representation must permit the 
symbolic or computational modification of a motion primitive 
in order to create a wide class of related or similar motions. 
"Generally accessible" implies eliminating choices such as 
libraries of artist-drawn animations, since the creation of 
natural-appearing hand-drawn animations is not a widespread 
skill. At the minimum, this constraint argues for parametric 
descriptions, though we need not commit to a specific set of 
parameters yet. 

• independence of specific individuals 

This issue again rejects the film or artist-drawn library 
approach, and also disallows more detailed but still joint- or 
segment-specific motion data collected from an individual. 
Thus while such motion may be used as the basis of a specific 
animation [15,24,21], it is not obvious how such a motion 
would change if it were applied to another individual with 
different body dimensions, weight, strength, posture, etc. 

• independence of specific motion characteristics 

This issue emphasizes the need for a parametric approach, 
though now the problem is the motions within an individual 
and the possible ways they can be combined, compounded, 
executed in parallel or sequence, inhibit or permit other 
motions, etc. Thus the primitives must describe possible 
actions of body components and be subject to synchronization 
and modification by other primitives. In addition, we expect 
physical factors to be separable: for example, the path of a 
motion should be separable from the kinetics of motion along 
the path. Again, representations of the library type cannot deal 
effectively with the computational explosion of possibilities 
inherent in arbitrary human motion. 

In constructing a movement representation we have been very 
concerned with its capabilities to describe sufficient information for 
a "performance" by computer synthesized graphic images [9], This 
point of view has been very fruitful in deciding what characteristics 
of a movement description and hence of an adequate representation, 
are necessary. The important concept is that movement synthesis 
considerations demand consistent implementable semantics. If a 
computer system could produce any movement specified by the 
appropriate descriptive parameters, then it would also verify that a 
representation was an adequate knowledge base with which to 
describe or notate observed movement. Thus, for example, if the 
representation cannot express the differences between "press" and 
"punch," it would not have sufficient means to distinguish these 
actions if actually observed. 

Symbolic representations of many movement properties are 
found in Labanotation [29], a movement notation system originated 
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over 50 years ago by Rudolf Laban. Though several notation systems 
exist, few come close to meeting the criteria for a movement 
representation. We initially studied Labanotation [9], basing the 
choice on several factors deemed essential for effective motion 
specification: 

• its redundant means of expressing a movement 

• its methods for handling sequence, concurrency, and phrasing 

• its capabilities for arbitrary frames of reference 

• its incorporation of goal-directed actions 

• its essentially "digital” symbol system. 

We abstracted these Labanotation properties into a set of five 
"primitive movement concepts" [63] ( directions , revolutions, 

facings, shapes , and contacts) concerned only with the location and 
relations of body joints or surfaces in space. Significantly, these 
primitives do not cover dynamic effects (force, acceleration, torque, 
etc.), muscular movements (bulges, contractions, etc.) or facial 
expressions [52,48]. Thus a motion specification in this system 
actually describes the final goal and some constraints on the path 
rather than the internal method by which it is achieved [5]. 
Directions generally describe positions to be achieved by body parts, 
or directions in which the entire body is to move. Revolutions 
include rotations and twists by given angles. Facings are goal- 
directed rotations which require a body surface to achieve a desired 
orientation. A shape is either a path along which a body or body part 
moves, or a spatial shape (position or configuration) which some 
subset of the body is to achieve. Contacts are generally relationships 
such as touches, supports, contains, etc., between two or more 
bodies, body parts, or environemental points. All the primitives 
share notions of duration, fixed end, and reference coordinate 
system. 

We have recently come to view movement somewhat 
differently. The evolution of this early motion representation is 
motivated not only by current efforts in three-dimensional computer 
animation [38,41], but also by practice in 
robotics [50,40, 27,49,20] and motion analysis [46, 60]. We 
distinguish four different kinds of movement primitives: 

• "Changes": rotations by a given angle or translation along a 
given path or direction 

• "Goals": achievement of a given location and/or orientation for 
a body point [35] 

• "Paths": curves in space along which points may move 

• "Dynamics": kinetics or forces which control or affect a 
motion 

The former "primitive movement concepts" are easily subsumed into 
the first three of these four primitives. The new primitive, dynamics, 
will be discussed in the next section. A comparison of the categories 
of the "old" representation [9,63] with respect. to this new 
representation appears in Table 1. 

In Table 1, a reach refers to the kinematic achievement of a 
location in space by some body point and an orientation to the 
kinematic achievement of an orientation of a body point. The "key- 
parameter" concept refers to a set of parametric values for particular 
manipulable variables of the body such as joint angles, reach 
position, body location, etc. 

Changes, goals, and paths must have associated with them 
durations, starting times, and reference coordinate systems. We can 
assume that the original specification is adequate in that regard [63]. 
Items such as fixed ends of a reach goal are indicated by zero 
changes in that body point in an appropriate coordinate reference 
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Table 1: Comparison of "old" and "new" movement representations. 

"old" I "new” 

- +- 


DIRECTION (movement) 
DIRECTION (position) 
REVOLUTION (rotate) 
REVOLUTION (twist) 
FACING 

SHAPE (movement) 

SHAPE (position) 
CONTACT 


Change in position 
Reach goal 

Change in orientation 

Change in orientation 

Orientation goal 

Sequence o£ reach goals 
or "key-parameter" 
locations 

"Key-parameter" positions 

Sequence or set 
o£ reach and 
orientation goals 


frame [5,25]. Thus the shoulder might be the fixed end for an arm 
reach to position and orient a hand with respect to some object The 
former contact primitive is subsumed into time-marked sets of one or 
more goals achieved sequentially and in parallel as needed. The 
semantics of determining those goals is left to a higher level 
process [7,65, 23,22]. 

The task of synchronizing concurrent actions and handling 
multiple constraints is passed to a control system rather than being 
explicidy embedded in the representation. A parallel control 
algorithm had been advocated eariier for this purpose [9]. The 
essential features of this control were 

• joint "processors" which interpreted parallel streams of motion 
primitive "instructions" as programs, 

• a special processor to handle movements of the center of 
gravity, and 

• a global monitor to synchronize local changes to a global, 
constrained, body model and thus process concurrent 
overlapping motion primitives. 

We can relax the control model by viewing the body parametrically, 
that is, any specified point on the body may be controlled by 
specifying a sequence of one or more values over time for it Paths 
are themselves a sequence of parameter values. The parameter 
values may be affected by more than one primitive, for example, the 
position of the body’s center of gravity may be affected by the path 
of movement, inertia, and external forces [9,25]. It is the 
responsibilty of the animator and the simulation semantics to resolve 
any discrepancies [10,53]. The particular interactions of the 
dynamics primitives are new and will be examined carefully in the 
next section. 

DYNAMICS 

A key feature of human movement virtually ignored in earlier 
representation efforts is its dynamic quality, that is, the manner in 
which the body moves in terms of force , effort, exertion , energy , etc. 
This may be more significant, in an expressive or intentional sense, 
than the actual path. For example, variations in dynamics can alter 
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the message conveyed in American Sign Language [31,39]. 
Dynamics considerations appear only implicitly in the 
representations derived from the study of movement notation 
systems because: 

• Labanotation (or for that matter, nearly any other notation) 
does not convey dynamic information other than timing 
(duration) and perhaps accent , 

• Motion semantics have been mostly concerned with visually 
smooth implementation of each primitive motion, not of the 
details of that motion during its execution nor with its 
continuity in the context of temporally adjacent motions, and 

• The computational models must include capabilities for 
understanding some minimal physics associated with body 
mass, force, inertia, gravity, balance, etc. [10]. 

Computer animation done without concern for motion dynamics 
looks fiat or mechanical at best; discontinuous or jerky at worst. 

Previous efforts at incorporating dynamics into computer 
generated animation have focused on explicit velocity or acceleration 
functions [43,58,17,26], artist-drawn keyframes [14,54], smooth 
spline functions [57,59,32], or actual human 
dynamics [15, 11, 66, 24]. The problem has been investigated more 
mathematically in mechanics [47,30,51], biomechanics [55], and 
robotics [27, 37, 13,28], though the latter has been much more 
concerned with computational efficiency. Recently, such techniques 
have been applied to human or articulated figure 
dynamics [3, 64, 25]. Our own examination of the dynamics 
problem has focused on alternative notation systems combined with 
physical and graphical motion models suited to the complexity of the 
human figure. 

In searching for a representational basis for the dynamic 
qualities of movement, we examined a notation system 
complementary to Labanotation called Effort-Shape 
notation [19,12]. Unfortunately, the semantics of this system are not 
defined quantitatively, so we have interpreted it freely to produce 
something more amenable to computation. We believe this to be a 
reasonable approach since our intent is not to "computerize" Effort- 
Shape, or another notational system as we and others have attempted 
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to do. Rather, we use these systems to aid in comprehending the 
scope and variety of human movement so that our representations are 
more likely to cover the space of possibilities. In the remainder of 
this discussion we describe the influence of dynamics considerations 
on a motion representation and sketch possible implementations of 
its semantics. 

SUMMARY 

The need for better animation control is apparent from the 
literature. The qualitative factors of Effort-Shape notation are being 
used to suggest extensions to existing movement representations in 
directions consistent with known characteristics of human movement 
and conventional animation. We show how the motion qualities may 
be at least approximated by a combination of kinematics and 
dynamics computations, with kinetic control modulated by 
acceleration and decelerations derived from existing interpolation 
methods. In addition, the interactions between two motions by 
phrasing are handled explicitly by modifications expressed in the 
representation. Temporal, spatial, and relationship interactions may 
be described and executed within an appropriately detailed model. 

Several animation systems are running or are under 
development at the University of Pennsylvania to demonstrate the 
feasibility and efficacy of these approaches. We are anxious to 
experiment with them and produce animations showing processes 
involving the interaction of several people with a complexity not yet 
demonstrated elsewhere. 
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The Interactive Specification of Human Animation 
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Abstract 

This paper describes the Figure Animation Project in 
progress at Simon Fraser University. The project has two 
goals for the specification of figure animation: the first is 
to implement an interactive Figure Animation Test Bed for 
specifying movement at the detail level and the second is 
to develop a mechanism for describing figure animation at 
the scene level. These goals — and approaches to solving 
them— are discussed. In particular, the application of 
knowledge-based inference to the problem of scene 
description is examined. 

Introduction 

The long-term goal of this project is to develop scene- 
level motion descriptions for articulated, humanoid figures. 
In general, three-dimensional animation of the human body 
— or of any other vertebrate — is based on an underlying 
framework of articulated elements. For instance, a 
reasonable approximation of the human skeleton can be 
achieved with about 24 segments if the fingers and toes 
are ignored. Most of our efforts over the past few years 
have been directed towards developing interactive techniques 
for specifying detailed figure movements, since a system 
embodying such techniques is a necessary component for 
developing and testing scene descriptions. Such an 
interactive test bed must be capable of displaying the full 
range of interesting scene actions: this range includes both 
the actions of individual joints, and the movements of the 
body as a whole. 

Above the detail level are the motivations of the 
characters and their interactions with the environment. 
This is animation at the scene level. After about ten 
years of continuous research in this area we are still not 
sure whether truly convincing animation of the full range 
of human movement is feasible, but there is no question 
that progress has been made in certain specialized areas of 
figure animation. What is needed now is the guidance of 
an intelligent supervisor to tie together these many pieces 
of the animation problem. 


To coordinate the activities of a human animation system 
we require a supervisory program that has knowledge of 
the overall goals of the characters in the scene. In 

particular, such a program needs to deal with the issues of 
path planning and with the physical constraints on jointed 
figures. Ideally, it should only be necessary to specify the 
character’s motivation in the scene and its initial position. 
The figure would then progress automatically and 
characteristically to its most likely destination from that 
starting point. In reality, the problem of determining an 
appropriate path from knowledge of the character’s 
intentions is difficult, as is the problem of having the 
character navigate around both the fixed objects and the 
other characters in the scene. This latter problem has 
occupied robotics researchers for many years. The problem 
of sophisticated route planning is particularly difficult 
when the characters are jointed walking figures. Not only 
does the figure have to move about unencumbered, but the 
feet must also pick their way around and over any objects 
that may be found on the floor. While performing this, 
the figure must maintain its balance and a reasonable 
posture. This is a particular problem when shifting weight 
smoothly from one limb to another. 

There are many physical constraints on what a character 
can and cannot do in a scene. For instance, a real figure’s 
anatomy and physiology impose limitations on how it can 
move and interact. An intelligent supervisory program for 
human figure animation must understand these limitations; 
in this way, only physically realizable scenes will result. 

Who Uses the System 

The Figure Animation Test Bed is used mostly by 
choreographers who work in the disciplines of skating and 
dance. This test bed system has been designed to evaluate 
the various interactive techniques — buttons, menus, pick- 
and-drag, rendering speed traded off against image quality 
— that can be applied in a figure animation system. Since 
the people best qualified to judge the effectiveness of a 
system's user interface are themselves the future users, we 
select the techniques with which these choreographers feel 
most comfortable. 
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The major users of the scene-level animation system are 
expected to be film and theater directors. The intelligent 
supervisory program requires expert knowledge on which to 
base its decisions. This knowledge is developed through 
cooperation between the computer scientists who developed 
the system and the directors who possess the expert 
knowledge. 

Project Background 

Over the years, many techniques have been applied to the 
problem of specifying the complexity of human movement. 

Rotoscoping 

One basic approach is to capture real movement patterns 
from a live subject. This can be done by "rotoscoping” — 
digitizing by hand the joint co-ordinates of all body 
segments from at least two orthogonal views recorded on 
film or video. This approach, while accurate, is tedious. 
It is important in biomechanics research, and there is 
continuing interest in automating it, but the pattern 
recognition problems involved are difficult. 

Live movement can also be captured in real time with 
special instrumentation. Goniometers provide a cumbersome 
but relatively inexpensive method [Calvert 80]. Expensive 
video scanning systems such as WATSMART and 
SELSPOT [Ginsberg 82], on the other hand, allow subjects 
to move freely in space; their actions are tracked by time- 
multiplexed light-emitting diodes attached to their joints. 
The movement patterns digitized with any of these 
methods can be normalized for speed and body size, and 
can be stored to create a library of fundamental movement 
patterns. 

Notation 

Another way movement patterns can be specified is with 
notation. While human movement can be described by a 
number of dance notation systems. such as as 
Labanotation [Smoliar 77] [Calvert 78] [Ryman 83], Eshkol- 
Wachman notation [Eshkol 79] and Benesh notation [Singh 
83], none of these deal directly with the problem of 
describing human movement in an unambiguous way. This 
is because all of these systems are intended for use by 
trained dancers and. as a result, they normally leave out 
numerous details that these dance experts consider obvious. 
The knowledge these artists bring to bear on the 
interpretation of a Laban or Benesh score is an example of 
the sort of expert knowledge needed by any functioning 
movement interpreter, be it human or machine. 

Our own experience with Labanotation has shown that it 
is a viable way to specify animation. Labanotation has 


the definite advantage that it relies on the animators 
conceptualization of the movements required, and it 
certainly lends itself to the development of complicated 
scores. However, the basic commands are at too low a 
level, and users have trouble predicting the outcome of 
commands. Even with the addition of a macro capability, 
it is still tedious (some might compare it to programming 
in assembly language). Not even dancers find it easy to 
learn. 

These systems are capable of describing a movement in 
arbitrary detail. But since they lack the grammatical 
structure needed for the construction of higher-level 
primitives from simpler components, this capability is not 
enough. Thus the important characteristic of extensibility 
is missing (in any really useful form) from existing 
movement-notation systems. 

Interactive Positioning 

A third approach (after rotoscoping and notation), is 
interactive positioning. This is the basis of the Figure 
Animation Test Bed, and involves the interactive 
specification of body positions in a 3-D graphics 
environment. The user is presented with a space-filling 
vector representation of the human body on the screen of 
a graphics workstation. The body can be viewed from any 
angle — in perspective — under the control of a mouse or 
equivalent device. The mouse selects body segments and 
orients each segment in three-dimensional space. The end 
result is directly equivalent to the output from notation, 
but the user has direct visual feedback and finds the 
adjustments to be natural and intuitive. 

Several attempts have been made to produce integrated 
systems for animating human movement [Calvert 
80] [Calvert 82] [Calvert 83], [Badler 82] [Badler 79a].* 

[Zeltzer 82] [Nichol 83]. Most of these have not directly 
addressed the problem of specifying the movement involved 
in a high-level, extensible way. Instead, the movement is 
described at what may be termed an “assembly-language" 
level, where it is difficult to collect (or “abstract”) 
detailed movements together into complex actions. 
Although a simple form of macro-expansion is available in 
the system developed at SFU [Calvert 78], this still does 
not provide enough power to develop a complex hierarchy 
of movement concepts. 

More fruitful work has been done in the area of the 
interface between an animation system and its users. 
Foley and Van Damm [Foley 82], describe the fundamental 
elements of good interactive design in terms of the 
conceptual, semantic, syntactic and lexical design levels. 

The conceptual design level describes the user model of 
the system; that is, the key concepts that the user must 
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understand in order to make full use of the proposed 
system. The names and relationships of the objects in the 
system make up this level. In the context of our scene- 
level animation system, the conceptual level includes such 
things as characters, stage directions and props. 

The semantic design level specifies the set of functions 
that the system is expected to perform. In our system, 
these functions include the ability to display a particular 
scene, the ability to learn new stage directions and 
character names, and so on. 

At the syntactic design level, the rules that specify the 
acceptable sequences of input tokens (and the functions 
that they trigger), are set out. In the scene-level system, 
these values include the detailed protocol of menus, 
windows, valuators and text that will control the scene 
display and the management of the expert system. 

Finally, the lexical level specifies the input tokens that 
the system recognizes. These include text tokens (such as 
"walk”, "run" and "stage left"), graphical tokens (sketches 
and selected menu-items) and gestural tokens (movements 
of the user's body). 

There is considerable interest in automating the 
development of a user interface, given its specification in 
some formal language. Such an automatic system is 
described by Olsen [Olsen 84]. Here a formal specification 
of a user interface can be used to generate a Pascal 
program that implements the functions of that interface. 
The input, in this case, is in the form of a grammatical 
description of the interface specification, which is entered 
by the system designer. 

System Configuration 

There are two prime requirements for an interactive 
environment in which the body positions are to be 
specified. The first is for realistic, three dimensional 
visualization of the spatial orientation of the figure; the 
second is for fast motion checking. To meet these 
requirements, we employ an IRIS 2400 Workstation as the 
heart of our hardware system configuration. A machine of 
this power, while expensive, is essential for smooth 
interactive positioning; a very large number of 
transformations is needed to represent two fully articulated 
figures. Hardware graphics power is also very important 
for fast and smooth motion checking. For these two 
needs, vector-based machines — such as those produced by 
IM1 or Evans and Sutherland — could also have been used. 
However, in addition to fast line drawing and matrix 
computation capabilities, the IRIS contains a 32-bit frame 
buffer with both smooth shading and Z-buffer hidden 
surface elimination available in hardware. These 


capabilities are useful while rendering the final animation. 

Using the Figure Animation Test Bed 

When the user develops a. piece of choreography on the 
computer she performs four main steps. These steps are; 

• to design the sequence of phrases. 

© to interactively generate the keyframes. 

• to interpolate the intermediate frames and 
finally, 

• to motion test the result. 

When the choreographer is satisfied, the resulting 
animated sequence may be rendered onto film. 

Sequences 

In our terminology, a piece of choreography is referred to 
as a sequence. A sequence in turn is composed of any 
number of phrases. A particular sequence is defined by a 
list of phrase names that are saved in a text file along 
with other global information thatv pertains to the the 
sequence as a whole. This includes the number of phrases 
in the sequence and the number of frames in each phrase. 
Note that the same phrase may be re-used in any number 
of different sequences. 

A phrase in turn is composed of a group of keyframes. 
Three keyframes are currently used to define each phrase 
since an approximating third-order spline is used to 
generate the intermediate positions. Each keyframe position 
is generated interactively on the screen of the IRIS 2400. 
At the start of this interactive process the user is 

presented with an image of two figures in a standard 
initial position. Each figure is represented by a vector 
image, where each body segment is modeled by a four¬ 

sided prism. It is important to use at least a crude space 
filling model like this, in order to give the user feedback 
on limb rotation and on the contact between body 

segments. Hidden lines are not removed. However, 
adjacent body segments are drawn with different colours to 
aid discrimination. The body parts themselves are 
positioned individually by a pick and drag procedure. 

Interaction 

To begin the interactive process, the user initially selects 
the foot which will act as the support point for the entire 
figure. Then the body position is built up by orienting 

each limb segment in turn, moving away from the support 
point. Using the mouse, a body segment is picked and its 
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three angles of orientation are adjusted in turn; the user 
can change the angle of view at will. A digital readout of 
the angles is given at the bottom of the display. A 
mouse-based valuator generates these analog values for the 
figure’s joint angle orientations, which in turn determine 
the orientation of the body parts presented in real time on 
the screen of the system console. This provides immediate 
feedback to the system user. After the approximate body 
position has been achieved, the user will iteratively refine 
the inter-segment angles until the desired result is obtained. 

To specify a second frame, the user can either start with 
a standard position, as she did with the first frame, or the 
first frame can be copied and used as the starting point 
for the second. Similarly a third frame is specified. 
These three frames form a phrase which is given a name 
and stored. Multiple phrases are built up in turn, 
typically using the last frame of one phrase as the first of 
the next. Very little typing need be done by the 
choreographer while interacting with the system. Instead, 
software buttons guide the user through the sequence of 
steps that result in the final animation. 



Figure 1 : Figure Animation Test Bed Screen 

In this screen, the user has picked the left thigh of the right 
figure, and is about to change its orientation. 


Currently, the user can specify the positions of two 
figures at a time. At this point the information available 
is equivalent to that which can be obtained by interpreting 
Labanotation commands or from instrumentation; thus data 
from different methods of movement specification can be 
combined. 


Interpolation 

Once the keyframes have been collected into phrases and 
the phrases assembled into a sequence, the “inbetweening” 
stage is performed. In this step a smooth series of frames 
connecting the keys is produced. A form of curve fitting 
based on third-order splines — one for each joint of the 
body — is used to determine the appropriate angle in space 
between the body parts articulated by that joint. Joint 
angle interpolation is very compute-intensive, requiring 
approximately 2000 floating point operations per frame. 
By formulating the cubic curve-fit in terms of matrix 
multiplication, the array-processing capabilities of the IRIS 
"Geometry Engines" can be used to solve the interpolation 
problem. Using these pipelined matrix processors. 1500 
intermediate frames can be computed per minute. 


Viewing 


Having completed the interpolation, the final step is to 
view the resulting action. This involves selecting one of 
the display options that trade off rendering speed v.s. 
image quality. These rendering options include line 
drawings, filled polygons, and smoothly shaded solids. 

Line Drawings This results in the fastest rendering. By 
drawing each body 'part as a rectangular 
prism, frame rates of about 4/sec can be 
achieved. Faster frames rates (as high as 
15/sec) can also be achieved using a 
simpler stick figure. 

Filled Polygons An intermediate level of rendering 
quality is obtained by using filled 
polygons to represent the body parts. 
This requires hidden surface elimination, 
increasing the rendering time to two 
seconds per frame. 


Smooth Shading The highest quality rendering requires 
the use of • smooth shading and an 
accurate lighting model. However, 
flexible, jointed figures produce special 
problems for any solid modeling 
technique; joint coverage is particularly 
difficult. 


We have experimented with using spheres as building 
blocks for constructing the figures [Badler 79b]. The 
figures are built up using a Constructive Solid Geometry 
(CSG) approach where the primitives are shaded coloured 
spheres of varying diameter. There are over 800 spheres 
in each body and each sphere is rendered with a polygon 
approximation which takes account of the lighting for each 
sphere. The shading is obtained with the Gouraud method 
and the hardware z-buffer in the IRIS provides hidden 
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surface removal. 

This resulting image contains over 400,000 smooth-shaded 
polygons per frame and requires 2-3 minutes to render on 
the IRIS 2400. Obviously, this is nuch too slow for 
previewing the motion of the figures, and so this technique 
is used only for frame by frame reproduction onto 16mm 
film. 

Work In Progress 

The facilities of the Figure Animation Test Bed system 
are being extended and the quality of figure rendering is 
being improved. Also, the design and programming of the 
scene-level animation system is under way. 

Animation Test Bed 

The test bed system has been used by a figure-skater 
and by a dancer as an experimental tool. This has already 
resulted in some significant segments of animation. Both 
users both have expressed a very definite preference for 
interactive specification over the use of Laban-style 
notation. As a result of their experience, two significant 
needs have been identified: 

• The movement patterns in our animated films 
result from the smooth interpretation of 
keyframe body positions. One problem is that 
the result is too smooth to be credible as 
human movement. As a result, we are 
investigating methods of adding small 
oscillations to the interpolated data into the 
Test Bed system. Each new movement 
generates an oscillatory transient and a small 
"wobble” is present at all times. In this way 
the movements achieve a more natural quality. 

• At present, the path traced out by a figure 
results from accumulating the individual actions 
(stepping, jumping, gliding) of the character. 

This makes higher-level path planning difficult. 

A higher-level route planning facility is being 
developed as part of the scene-level animation 
system. 

Director’s Apprentice 

There is a long term interest in developing systems that 
can interpret very high level descriptions of animated 
scenes. A good real-world model of such a description is 
the film script. The knowledge needed to make sense of 
such a description requires the use of a knowledge base 
composed of rules. These rules will come from the 
knowledge and experience of an expert director. We have 


named this project The Director's Apprentice, as it will 
"learn its craft" by studying the rules and practises of a 
human director. This learning process will take place as it 
aids him in the task of developing an interactive story¬ 
board. 

The knowledge base of directing principles — used to 
interpret the script — will contain a collection of w if-then- 
else" rules. These rules map scene attributes — such as 
character motivations, script text and classic directing rules 
— into scene action. Of course, this mapping is neither 

unique nor well-defined, and it will result in numerous 

conflicting interpretations for each combination of 
attributes. A rule-interpreting inference engine must then 
arbitrate these conflicting conclusions, and decide on some 
reasonable resulting action for the scene. There are 

many categories of rules and concepts that such a system 
will need to address. These concepts include the inter¬ 
character feelings that dominate motivation, and the 
appropriate shifting of audience focus from one character to 
the next as the plot progresses. Knowledge of directing 
terminology — stage right, up stage, down stage — and 
standard set compositions will be needed to support the 
script interpretation. 

Since considerable knowledge of the scene constraints is 

needed, the process of abstracting the detailed description 

of movement is not a trivial matter. For example, the 
phrase “ John walks across the room and stops at the , other 
side of the desk" may be used to describe many different 
scenes: the exact scene would depend on the arrangement 
of the furniture, on the other people in the room, on 
John’s starting position and on his particular gait. To 
achieve the level of descriptive power needed for an 
effective directing language, an expert or rule-based system 
is being developed to provide the knowledge of scenes, 
physiology, habits, and so on, that are needed to abstract, 
the directing concepts. 

Expert systems in AI have been developed largely to 
solve problems involving the deduction of answers from a 
set of facts stored in a knowledge base. Such a knowledge 
base contains the accumulated knowledge of one or more 
experts in that particular field. But how does this differ 
from a conventional data-base management system? Perhaps, 
the most important capabilities that distinguish an expert 
system are: 

1. Inference: the ability to form a long, complex 
conclusion using the information present in the 
knowledge base [vanMelle 81]. This knowledge 
base contains both relations — as in a 
conventional relational data base — and the 
rules of inference that permit the system to 
infer new relations from those that were 
initially given. 
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2 Self-knowledge: the ability of the system to 
understand and explain interactively the 
structure of its own data base and its own 
inferential mechanism [Davis 82] [Davis 80]. 

3. Flexibility: the ability of the knowledge base 
to grow and adapt to new knowledge as a 
result of correcting comments make by an 
expert consultant [Winston 81]. 

Expert-system projects have been developed for many 
problem domains, but only a few of these were ever taken 
to the point of being really useful [Nau 82]. Of these, 
probably the most successful implementation was the 
MYCIN [Shortcliffe 76] [Davis 82] project, a question- 
answering system intended for the problem of medical 
diagnosis. MYCIN was based on the use of 
production-system methodology, where the knowledge base 
consists of productions (or rule*} of the form 
"if Aj is true, 
and if A L Is true, 


and if A n is true, 
then Cj is true." 

The applicability of production systems to the 
accumulation of expert knowledge has been discussed by 
Langely [Langely 83]. This work points out that the 
inherently modular nature of information that has been 
described in terms of productions makes the the task of 
incorporating new information into the structure much 
easier. The considerable success of the MYCIN project in 
answering complex diagnostic questions — measured against 
the performance of real “expert*’ physicians — led to the 
development of EMYCIN [Davis 82]. This consists of the 
pure “expert system” of MYCIN. but stripped of the 
medical-diagnosis data base. It led to further successful 
tests of the production-system mechanism in other subject 
domains (such as civil engineering and geology [vanMelle 
81]). 

The success of production-based expert systems largely 
derives from the ability of these systems to encapsulate 
specific facts in the knowledge base in the form of 
individual productions [Davis 82], [Langely 83]. The 
advantages of this are that local, specific changes can be 
made to the knowledge base, (adding, modifying or deleting 
individual rules) and the changes* effect on the other rules 
in the system can generally be predicted in a straight¬ 
forward manner. In fact, the same mechanism can also be 
applied to the strategies used by the reasoning system 


itself, as described in [Davis 80]. This opens the 

possibility of having the search-strategy mechanism itself 
defined in terms of productions — which can be added, 
modified, and deleted easily. 

The production system in MYCIN was augmented by its 
ability to assign a certainty factor to the conclusion of a 
rule. These "confidence measures” range from 100% ("is 
certain to”) down to -100% ("is certain not to”). This 
involves having the rule interpreter combine the certainty 
factors of the subordinate rules to form the certainty of 
the conclusion. The use of continuous-valued (or fuzzy) 
logic in the inferential mechanism derives from the 
observation that few conclusions in any real domain can be 
made with absolute certainty [Zadeh 83]. Statements such 
as "If the character is John then there is an 80% chance 
that he will walk quickly across the room, and there is a 
20% chance that he will walk slowly across the room," may 
be important when acquiring empirical judgements from 
human directors. 

The function of an expert system is to answer questions 
using expert knowledge and reasoning. The "questions” 
that the Director’s Apprentice will try to answer will arise 
during the interpretation of the script. For example the 
sentence "John walks quickly across the room, and stands 
behind the desk" will generate the following questions for 
the expert system to answer: 

1. Is John presently in the room? 

2. Are there any obstacles between John’s present 
position and the desired final position? 

3. If so, what is the best path for him to follow 
around, over, or under those obstacles? What 
sort of motion, if any, is meant by “walks 
quickly’’, or by “stands still”? 

Issues of route planning, relating to the solution of 
problems (2) and (3), have been discussed by Lozano-Perez 
[Lozano-Perez 80]. More difficult questions relating to 
the meaning of the film, such as “Is it in character for 
John to kick over a chair on his way to the desk?” 
require the presence in the database of detailed information 
about the psychological and emotional aspects of the action. 
These considerations have been discussed by Fleischer 
[Fleischer 84]. 

Many recent expert systems have been concerned with 
the problem of using expert knowledge to make sense of 
sentences expressed in a natural language. Since we believe 
that free-form natural language is an inappropriate 
interface for an interactive graphics system, the linguistic 
issues that these systems address (pronoun references, 
multiple-clause sentences, and others) are not directly 
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relevant to this research. Instead of an unconstrained 
natural language interface, we are are implementing a 
system-directed conversation, implemented by menu-picking. 
The advantages of this approach have been described by 
Rich [Rich 84]. 

Interacting with the Director's Apprentice 

Interaction with the Director's Apprentice involves two 
steps. The first step is to design a set of rules that 
capture the facts and relations inherent in the key directing 
concepts — this forms the knowledge base. The second 
step is to query ’ the knowledge base using the control 
panel of the Director's Apprentice inference engine. 


actor who is just about to begin speaking — receives focus 
by walking downstage, or by having the other actors walk 
upstage. This puts him in an attention-grabbing downstage 
position relative to the others. Encoded as a knowledge¬ 
base. this action may include such rules as: 

IF next-to-speak is actor., 

AND actor, is-downstage-of actor., 

THEN j moves-upstage-of i. 

Many other rules, defining the concepts "is-downstage- 
of" and "moves-upstage-of" would also be required. 


Most stage action in a play consists of explicit 
commands to the actors, such as "exit stage left", "approach 
the upstage character" and so on. This is termed inherent 
movement. However, stage actions may often be inferred 
from a script even if that script contains only dialogue. 
This is termed imposed movement. Many excellent texts 
(for example Allensworth [Allensworth 82]) have been 
written for theater students, detailing how certain types of 
action may be generated from a knowledge of the flow of 
dialogue from one speaker to the next. 

The initial set of rules that have been tried in the 
knowledge base of the Director's Apprentice deal with the 
theatrical concept of focus. Focus concerns the use of 
subtle character action to shift audience attention from one 
character to the next, generally one step ahead of the next 
change of speaker. This is done so that the audience will 
have time to settle its "focus" on a character before he 
begin to speak: otherwise, his first few words or gestures 
may be lost to those of the audience who are watching 
someone else. The following are some ways that may be 
employed to achieve shift of focus, based on the known 
flow of dialogue in a script. 


Up 


9 

Down 

Figure 2: Focus by Position 

Here, the principal actor (the next to speak) has been given 
focus by having the other actors stand further upstage. 


In focus by position . the principal actor — that is. the 





In actual line focus, the principal actor receives focus by 
having the other actors align themselves in a virtual 
"arrow” that points at him. To handle actual line focus, 
the knowledge base would need to contain rules such as: 

IF next-to-speak is actor t -. 

AND actor^. is-not-aligned-to actor^, 

AND actor^ is-closest-to actor^ 

THEN align-actor j,i, 

AND next-to-be-aligned-is j 


IF next-to-be-aligned is actor^. 
AND actor^ is-closest-to actor^, 
THEN align-actor l,k, 

AND next-to-be-aligned-is l 


and so on. 
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In visual line focus, the principal actor receives focus by 
having the other actors turn to face him. To handle this 
focusing rule, the knowledge base would need to contain 
rules such as: 

IF next-to-speak is actor., 

AND actor^ is-not-facing actor., 

AND actor^ is-closest-to actor £ , 

THEN turn-to-face-actor j,i. 


At present, an ordinary text editor is used to enter these 
rules into the knowledge base. A structured rule editor is 
planned. 

The inference engine for the Director's Apprentice runs on 
a SUN Workstation, with a high-resolution (1150 X 890) 
bit-mapped screen. The various functions of the rule 
interpreto r and query system are invoked by menu picks 
on a series of adjustable panels (windows) that pop up 
under control of the inference mechanism. This way, it is 
hoped that directors using the system will adapt quickly to 
the interactive dialogue. 

Improved Rendering 

The current rendering technique, based on a body built 
up from spheres, leaves much to be desired. The use of a 
skin approximated with a polygon mesh promises both the 
improved application of modern lighting models and 
smoother surfaces. However, when human figures are 
rendered by existing polygon modelers, the results are 
generally stilted and cartoon like. Closeups of bending 
joints are particularly difficult to render smoothly. A 
robust method to move the control points of a skin 
derived by splines is being developed. The goal is to 
achieve a skin which stretches naturally as the body 
segments move relative to each other. 


Conclusions 


AND next-to-face-is j 


IF next-to-face is actor^. 


Many areas of research need to be explored further, at 
both the detail level and scene level. At the detail level, 
more study needs to be done on the best ways to 
interactively specify figure movement. The present system 
is continually being revised in response to user comments 
and suggestions. 


AND next-to-speak is actor.. 
AND actor^ is-not-facing actor., 
AND actor^ is-closest-to actor^, 
THEN turn-to-f ace-actor l,i. 


At the scene level, a great many more rules will be 
needed to effectively implement change of focus. Other 
forms of imposed action are being studied as well. In 
order to put the knowledge base on a firm theoretical 
loundation, we are also looking into developing a formal 
semantic model (as described in DelGrande [Delgrande 86]) 
of directing concepts. 
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Goal Directed Animation using English Motion Commands 

Karin Drewery 
John Tsotsos 
Dept, of Computer Science 
University of Toronto 


ABSTRACT 

This paper describes a prototype 3D animation system which can execute limited types 
of english motion commands by solving simple goals and directions. 

This system has a frame-based knowledge base (KB) to describe objects and another to 
describe motions. A hierarchical planning system uses the motion descriptions as forward 
production rules to form a plan for a goal task. Directional commands relative to the object 
or a reference object can also be processed by refering to the directional description in the 
motion KB and the object’s reference frame. 

The underlying objective is to develop a method to incorporate goals into a graphical 
animation system so that it will be a task level system [Zelt85], where a behaviour is 
described in terms of events and relationships and frees the user from specifying all details 
of a motion. ~ 

Keywords : KB graphics, animation, motion description, goals. 


1. Background 

Developing animation systems which are task level 
systems is a relatively new research area in computer 
graphics and a complete one does not yet exist. 
Currently some animator level systems [Zelt85] in which 
the behaviours of the objects are described algorithmi¬ 
cally in a programming language, exhibit properties that 
would be useful to a task level system. 

In Reynold’s actor-based [Reyn78] and Murtagh’s 
object-based [Murt85] systems, objects can pass mes¬ 
sages which allows adaptive motion. Adaptive motion 
occurs when the control processor uses information 
about the objects and their environment to control the 
objects’ movements [Zelt85]. In MIRA [Thal83, 
Magn84, Magn85] attributes of the objects can be 
updated and examined also allowing adaptive motion. 

Badler [Badl80, Badl81, Badl82] has been involved 
in representing human motion and developed Tempus. 
This system contains resolved motion algorithms for 
limb positioning and approaches task level animation. 
Zeltzer [Zelt82, Zelt83] developed SAS which uses local 
motor primitives (LMP’s) to execute movement func¬ 
tions which have preconditions and has described ideas 
for a KB system which would be a task level animation 
system. 

In the meantime work in Artificial Intelligence 
concerning motion descriptions was being done. Using 
Miller’s [Mill72] classification of motion verbs, Badler 


[Badl75] and Tsotsos [Tsot80] added their own 
modifications to create motion description and analysis 
systems. The KB for our system GEMS was derived 
from a modification of the frame based system proposed 
by Tsotsos. 

A frame [Mins75, Gold79] is a representational 
structure which consists of slots which can decribe parts 
of the item being represented and can contain informa¬ 
tion describing the relations between the slots. The 
slots form a PART-OF hierarchy. The frames form an 
IS-A hierarchy defining general to specific information 
(see example 1). 

A frame based system was proposed because it can 
express hierarchical descriptions, allows general to 
specific levels of detail and conveys inheritance of pro¬ 
perties. A frame which is below another frame in the 
IS-A hierarchy inherits the properties of the one(s) 
above it. 

GEMS arose by incorporating a motion processing 
queue scheme similar to Zeltzer’s [Zelt82] to process 
the information in the frames of the KB. 


2. Overview 

The user must first define a KB for 3D hierarchical 
objects and a KB for their motion descriptions. To 
begin an animation the user must instantiate objects 
from the frames defined in the object KB. 
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The next phase involves a motion processor which 
accepts motion commands in this format: 

subject motion-verb reference ! directional adverb 

or 

agent motion-verb subject reference I directional adverb 

The command is parsed and existence checks on 
the objects are done by searching the object KB. 

The task manager then consults the motion KB to 
process the motion verb. This is done by traversing the 
hierarchies described in the frame structured KB. From 
the hierarchical descriptions, the motion verb is broken 
down into its underlying primitives, which are internally 
defined procedures. If the motion has preconditions 
then the planning system may be called. 

The planning system is modelled after ABSTRIPS 
[Sace74,Nils80] a simple hierarchical planning system. 
A hierarchical planner was chosen because in many 
situations a subgoal condition of a goal can be regarded 
as a detail and does not need to be solved until the 
major steps of the plan are solved. Thus the plan is 
developed level by level. 

A goal directed motion task in the KB will have a 
precondition list with priorities assigned to each precon¬ 
dition to indicate which ones should be solved first. A 
precondition is an action or state which must be exe¬ 
cuted before the task can be done. 

The task will also have delete and add state lists. 
The delete list refers to states which are no longer true 
after the task has been completed and the add states are 
those which are now true. These lists are used by the 
planner to keep track of the current state situation. 

The planner begins with the initial state situation 
and applies the appropriate motion tasks, which are 
actually forward production rules, to achieve the 
desired goal. A task is selected to be a possible element 
of the plan if a state in its add list matches the top of 
the goal stack. If this task has preconditions whose 
priorities are greater or equal to the current maximum 
priority value then they are placed on the stack. Other¬ 
wise the current state description is updated by remov¬ 
ing states which are in the task’s delete list and adding 
those listed in the add list. The chosen sequence of 
motion tasks forms a plan to achieve the original goal. 
Details of the planning algorithm can be found in the 
references mentioned. 

The user may specify precisely which motion will 
supply the precondition needed, by using an if state¬ 
ment. Or instead, the user may specify only the states 
that are preconditions and the planner will determine 
the motions that will achieve these states. Using the 
plan and hierarchical information from the KB, a 
motion queue for each object that was referenced in the 
command is built. 

A clock is run and at evenly spaced time intervals 
checks are done on each queue to determine which 
motion, if any the object should be undergoing. Interac¬ 
tion conditions are evaluated as the motion is executed 
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and may cause nodes to be added or removed from the 
queue. During this, current state information must also 
be updated. For each frame of the animation a file con¬ 
taining the appropriate transformations and object data 
is created and can later be passed to a rendering system. 


3. Object Description 

An object frame consists of a description and a 
dependent section. The description section contains slots 
which define an object’s parts. A part has a name and a 
type which is either another user defined frame in the 
object KB or a system defined object primitive such as a 
cube, a sphere, a vector etc. If the object’s parts are 
joined then the joint may be specified by the the use of 
a connected-to expression. Constraints on the rotation 
angles can be defined. A slot may also contain con¬ 
straints on dependent variables. 

An object’s geometry type is indicated by a slot 
type or in a "IS-A" expression. The type can be a 3D 
primitive or a user defined polyhedra or mesh, both of 
which can be defined in the KB or input from a file. 

The dependent section contains definitions for vari¬ 
ables which are dependent on the object’s structure. 
An object in this system must have a centroid, a bound¬ 
ing box and direction vectors. Direction vectors are 
formed from the centroid to each face of the bounding 
box to discern the top, bottom, left, right, front and 
back of an object. 

Example 1 shows some of the frames which could 
be used to describe a robot with joints. Notice that the 
constraints on the slots appear between the square 
brackets. 


4. Motion Description 

A motion frame consists of a description, dependent, 
preconditions, interactions, delete and add state sections. 
The description section (similar to an object’s) contains 
slots which describe the parts of the motion. Each slot 
contains a name or label for the part followed by a type 
which is either another user defined motion from the 
KB or a primitive motion type (rotate, translate, scale). 
This may be followed by constraints on the type’s 
dependents such as timing or speed. 

There is also a subj slot to allow the user to define 
the type of object that exhibits the motion. An agent 
slot allows the user to specify the type of object that 
produces the motion. Similarly there is a ref slot where 
the user can define the type of object (if any) that the 
motion references. 

The dependent section contains slots which define 
the parameters or variables of a motion. The label of 
the slot is the dependent’s name followed by its type 
which must be a system defined primitive. 

The preconditions, add and delete lists are used by 
the planning system as mentioned. A precondition 
allows the user to specify in the KB, which motions or 
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states must be done before the desired command can be 
executed. Suppose that the command "RobotA EXIT 
roomB" was given. Example 2 describes a frame for 
EXIT and specifies its preconditions. 

In this example the preconditions are listed as 
states. The numbers in the labels indicate their priori¬ 
ties. First RobotA must be inside roomB as indicated 
by pi. The second precondition requires that the room 
have a door, otherwise we shall assume that the robot 
cannot exit the room. Then, in order to exit the room 
the robot must be standing and he must be near the 
door. A motion frame such as APPROACH would direct 
the robot to the door. The last precondition requires 
that the door be open. 

The action of exiting the room is done by the 
frame WALKTHROUGH specified in the motion 
description. Notice that after the robot has exited the 
room the states OUTSIDEjOF (subjsef) and STANDING 
(subj) must be added to the current state list. 

The user must create motion frames in the KB with 
many preconditions to create more realistic goal- 
oriented descriptions. 

Interaction conditions are tested while an action is 
occuring. In Example 3, the frame description for 
FLIGHT of a missile rocket, the interaction condition is 
to check if the rocket contacts any object in space. If it 
does then it will explode. In this example the user 
specifies precisely which action will occur using the if 
statement rather than just specifying states as in the 
previous example. 


5. Conclusions and Extensions 

GEMS presents a method to execute goal-directed 
graphics by using an object and a motion KB and a sim¬ 
ple planning algorithm. The preconditions , add and 
delete lists in the motion KB enable the system to form 
a plan for the motion. Internal procedures calculate 
directions and relationships, and perform graphical 
motion primitives needed to display the plans. It would 
be a more powerful system if it could define these inter¬ 
nal procedures. 

The planning system used is limited. It does not 
account for the interaction of many agents which is 
needed in some animations. It is based on state changes 
and assumes that any state not mentioned in the delete 
and add lists are unchanged. A more recent work by 
Stuart [Stua85] has developed the idea of synchronizing 
multi-agent activities. 

To incorporate a larger class of goal oriented com¬ 
mands GEMS should be extended to contain a reach 
algorithm for jointed limbs [ORou78, Kore82] and a 
complex path planning algorithm [Loza79]. 


Example 1 Description of an Object 

Frame ROBOT is-a 3D_LINKED_OBJECT, MOBILE 
Description: 

head: HEAD [ body connected_to head at (0,5,0)]; 
body: 3DJRECT [ xwidth = 4.0’; 
ywidth = 6.0; 
zwidth = 2.0;]; 
right Jeg: RIGHTLEG 

[ body connectedjo right Jeg at (1,6,0), 
rotx < 90, rotx > -90, 
roty < 90, roty > -90, 
rotz < 90, rotz > -90 ]; 
etc. 

end Frame 

Frame HEAD is-a ELLIPSOID 
Description: 

cl: [ xradius = 3, yradius = 4, zradius = 3]; 
end Frame 
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Example 2 for 'RobotA EXIT roomb" 

Frame EXIT ts-a 3 D_SEQUENTIAL_MOTION 
subj: ROBOT; 
ref: ROOM; 

Preconditions,Delete: 

pi: INSIDE (subj^ef); 
p2: PART OF (ref,door); 
p3: STANDING (subj); 
p4: NEAR (subj^ef); 
p5: OPEN (ref.door); 

Description: 

dl: WALKTHROUGH [ ref = ref.door, 
dur = 2]; 

Add: OUTSIDE_OF (subj/ef), STANDING (subj); 
end Frame 


Example 3 for * Missile FLIGHT to Moon * 

Frame FLIGHT is-a 3 D_SEQUENTIAL_MOTION 
Interactions: 

pi: If (CONTACTS(subj,ANY)) then EXPLODES; 

Description: 

subj: ROCKET; 
ref: OBJECT_IN_SPACE; 
launch: LAUNCH [ start_time = 5, 
duration = 30, 
speed = 100 units/sec ]; 
phasel: PHASE1 [ duration = 90, 

speed = 200 units/sec ]; 
phase2: PHASE2 [ duration = SO, 

speed = 300 units/sec ]; 

end Frame 
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Abstract 

Animation which uses three dimensional computer graphics 
relies heavily on geometric transformations over time for the 
motions of camera and objects. To make a figure walk or make a 
liquid bubble requires sophisticated motion control not usually 
available in commercial animation systems. 

This paper describes a way to animate a model of the human 
face. The animator can build a sequence of facial movements, 
including speech, by manipulating a small set of keywords. 
Synchronized speech is possible because the same encoding of the 
phonetic elements (segments) is used to drive both the animation 
and a speech generator. Any facial expression, or string of 
segments, may be given a name and used as a key element. The 
final animated sequence is then generated automatically by 
expansion (if necessary) followed by interpolation between the 
resulting key frames. 

We present two alternative modelling techniques for 
constructing the face: a polygon mesh and a functional description 
using a technique called soft objects. 


Resume 

L’animation par ordinateur en 3-dimensions compte fortement 
sur des transformations geomeltriques sur temps pour les 
mouvements de la camera et des objets. Pour faire marcher une 
silhouette ou bouillir une liquide, il faut un controle de mouvement 
sophistique', qui n’est pas generalement disponible aux syste'mes 
d’animation commerciaux. 

Cet article fera la description d’une fa^on d’animer un mode'le 
d’un visage humain. L’animateur peut construire une sequence de 
mouvements facials, y compris le discours, en manipulant une petite 
serie de mots cles. Le discours est automatiquement synchronise', 
parce que les memes elements phone'tiques controlent l’animation et 
la restitution vocale, N’importe quelle expression facialle, ou serie 
de segments, peut etre nominee, et utilisee comme element cle'. En 
dernier lieu, la sequence d’animation finale est cree'e 
automatiquement par l’expansion (si necessaire) suivi par 
1’interpolation entre les images clees. 

Nous presenterons deux techniques alternatives de construire le 
visage: a polygonale maille, et par une description fonctionelle 
utilisant des objets moux. 


Introduction 

Three dimensional animation using computer graphics often 
suffers from a lack of sophisticated motion. Current modelling 
techniques can produce realistic looking images, but are not suited 
to representing objects in motion. Nor do we have established ways 
to describe complex motion to the computer system. 

The human face is a prime example of an object which moves 
in a very complicated way, that cannot be easily and convincingly 
controlled by simple geometric transformations in time, unless 
constraints are placed on the possible motion. The major work in 
this area was done by Fred Parke at the University of Utah [Parke 
74] and later developed at New York Institute of Technology [Parke 
82]. Parke uses a face built from polygons and identifies groups of 
polygons which can be changed according to a set of parameters to 
control facial expressions and features. A second approach [Platt 81] 
is to use a structure based model where the muscles to be moved are 
described. While simulating the underlying facial muscles allows for 
exact representations of wrinkles and face motions, an adequate 
facial model has not been fully developed using this representation. 
This is due both to the difficulty in encoding all of the facial muscles 
and the complexity of its motion due to the number of degrees of 
freedom allowed the animator. 

Although much work has been done on the modelling of the 
face, synchronized speech animation is still effected through 
rotoscoping or related techniques. 


The Graphicsland Animation System 

The Graphicsland project group [Wyvill, B. 85a] at the 
University of Calgary has developed an organised collection of 
software tools for producing animations from models in three 
dimensions. The system allows the combination of several different 
kinds of modelling primitive [Wyvill et al 85b]. Thus polygon based 
models can be mixed freely with fractals [Mandelbrot 83, Fournier 
82] and particles [Reeve 83] in a scene. Motions and camera paths 
can be described, and animations generated. Note that we do not 
include the use of a two dimensional “paint” system. Our objective 
is always to construct views of a full three dimensional model. 

Our objective in this work was to introduce better techniques 
for motion control than commonly available and integrate them into 
Graphicsland. 
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Interfacing to the Parke Model 

The face-representation we used has been developed from Fred 
Parke’s work at the University of Utah [PARKE 74]. Parke models 
the face as a collection of polygons which may be manipulated 
through a set of 50 numerical parameters. These control such things 
as length of nose, jaw rotation, shape of chin, and other similar 
facial features, and allow movement of these features by 
interpolating the parameters. 

To describe motion directly using these parameters is clumsy 
and difficult. Motions were described as a pair of numeric tuples 
which identified; the initial frame, final frame, and interpolation 
type, followed by the parameter, the parameter’s initial value, and 
the final value. In order to aid the animator, a keyword based 
interface was developed. The interface makes it possible to build up 
libraries of partial expressions (smile and blink would be two partial 
expressions) and place them anywhere within an animated sequence. 
It also has the ability to detect conflict should two simultaneous 
partial expressions attempt to manipulate a facial feature in 
opposing directions at any point in the animation. 


Expression 

We specify each partial expression by means of a set of 
keywords. We must describe: 


1) the part of the face to be moved (eyes, mouth, cheeks...), 

2) the type of movement (open, arch, raise...), 

3) the initial and final frame number 

4) the parameter value (normalized) at the final frame, 

5) and optionally, the type of interpolation (default is linear) 

For example, to open the mouth the dialogue might be: 

open mouth frame 12 25 value 0.8 

This would cause the mouth to open with 80% (0.8) of the 
maximum jaw rotation, begining at frame 12 and ending at frame 25 
of the animation. Alternatively, motions may be grouped together 
into a key element, for example, a blink expression might be 
specified: 

animate blink 

close eyes frame 1 2 value 0 
open eyes frame 2 8 value 0.9 

end 

Once an expression has been specified, the animator may place that 
motion at several places in the animated sequence: 

add blink frame 25 
add blink frame 36 

Figures la,lb and lc show three frames from the blink sequence. 
Figure 2,3 and 4 show various expressions. Figure 4 also has had 
hair grown on the head using a number of particle generators 
distributed on the polygons which define the scalp. 


Speech 

In all work so far, the animation of a talking sequence for one 
of these facial models has been done using a technique similar to 
rotoscoping. A human actor is filmed, reciting the required script, 
and the facial model is constrained to follow the sequence of lip and 
jaw positions needed for each frame ol the animation. 
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This process is tedious and expensive. Various alternative 
approaches exist, all based on some knowledge of the relationships 
between speech sounds and the configuration of the articulators. 
Articulation models giving information about jaw position, lip 
spreading or rounding, tongue position, and their dynamic 
relationships, for various speech sounds, can be built up from the 
literature on lip reading and acoustic phonetics [Walther 1982, 
Levitan 1977]. Subjective evaluation of the adequacy of such 
models, followed by correction and re-evaluation ensures that the 
models are good functional representations. An animator can use 
such model data to set up key frames corresponding to successive 
segmental articulations. The computer can interpolate the key 
frames according to more or less simple rules, and the resulting 
product can be dubbed in the usual way. Alternatively, the 
articulatory parameters may be controlled by recognition of the 
sounds produced by an actor, which ensures natural rhythm for the 
resulting speech. However, speech recognition is still less than 
perfect, and such systems tend to rely on sound classification that is 
both crude and error prone. Our approach is to synthesise both the 
speech and the sequence of facial expressions by rules, based on the 
articulatory model for the facial movements, and based on rules for 
acoustic synthesis for the speech. Thus phonetic script 
incorporating both segmental and suprasegmental information drives 
both aspects of the speech animation sequence. Synthetic speech is 
still somewhat unnatural, but such speech animation is both 
intelligible and well synchronised. Large quantities of speech 
animation can be generated at virtually no more cost than the 
graphical animation that forms part of it. Furthermore, script 
changes can be incoporated without having to rely on the 
availability of a particular real speaker. The synthesis is based on a 
long-standing research project that includes a new model for speech 
rhythm based on a generalisation of real speech rhythm data [Hill 
1978a, Hill 1978b]. 

Normal speaking rates vary a good deal. Typical segment 
(speech sound) durations for normal speech vary between 50 and 250 
millseconds. Each articulation changes in a basically continuous 
fashion into the next one. For the acoustic synthesis, piecewise 
linear interpolation of the acoustic parameters, from one target to 
the next, according to relatively simple rules a few time divisions, 
has been found adequate for high quality synthesis. A typical 
sampling rate would be one sample every 10 milliseconds, but linear 
changes over periods of 20 to 80 milliseconds occur. A similar 
approach is being adopted for the interpolation needed for the 
changing facial expression dictated by the moving articulators. The * 
rate of interpolation varies, just as for the acoustic synthesis (and in 
synchrony with it), but the rules are few and simple. At 24 frames 
per second, the average frame rate for a movie is approximately one 
frame every 40 milliseconds. This is well matched to the sampling 
rate needed for a fairly accurate representation of the synthetic 
speech, as might be expected from observations of real speech on 
film. 

As an example of speech, the phrase “Hi there” could be 
achieved as follows: 

sentence greet 
h 

ah 

i no 0.3 
th 
e 
r 

end 

A sentence is specified and denoted by the name “greet” in order to 
allow the user to refer to the sentence again for modification, 
deletion or placement. The “end” command marks the end of the 
sentence, and allows the system to calculate the number of frames 
required for the speech. The duration‘and enunciation parameters 
available for each segment are being used with the diphthong “AH 
I”; it’s duration is now 170 ms. and it’s enunciation in terms of 
mouth position is 0.3 of it’s full possible range. If these parameters 
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are not supplied, the default duration for each element of the 
phonetic script is used. 

Once the animator is satisfied with the placement of the 
partial expressions, the sequence may be examined for motion 
conflicts with the command “check”. Should a conflict be 
discovered, the partial expressions which clash are displayed and 
may be edited. 

Figures 5a,b,c,d and e show selected frames produced from the 
sentence “greet”. 


Soft Objects 

The term “soft object” is used to refer to the particular class 
of objects whose shape varies constantly because of forces imposed 
on it by its surroundings. 

We have been experimenting with a general model for soft 
objects which represents an object or collection of objects by a 
scalar field. That is a mathematical function defined over a volume 
of space. The object is considered to occupy the space over which 
the function has a value greater than some threshold so the surface 
of the object is an iso-surface of the field function. That is a surface 
of constant function value within the space considered. The idea of 
using such surfaces for 3D modelling was first put forward by Jim 
Blinn [Blinn 82] and refined in the Graphicsland system [Wyvill 85c]. 
Using the field function developed in [Wyvill 85c], such surfaces can 
be finely controlled by varying the radius of influence and field value 
due to each key point. Our initial experiments suggest that fewer 
key points are needed to control the facial movements than with 
other techniques, and the process is computationally less expensive 
than using B-splines or polygon meshes. 

Although a polygon mesh is a useful representation for the 
face model, it forms only a crude approximation to the smooth 
curves of a face, and suffers from the problem that current shading 
techniques smooth the centre the mesh leaving an un-smoothed 
polygon silhouette edge. B-spline patches [Huitric 85] have been used 
to define a smooth surface and fewer control points are needed to 
define the face than with polygons. However a set of soft object key 
points share these advantages and have several more. Soft object 
control points may have different colours associated with them. The 
colour of a control point affects the colour of a local region of the 
face, and this colour will be smoothly blended into the colours of the 
surrounding regions. A face may contain various areas of different 
colour, for example, rosy cheeks, red lips, a dark chin and pale 
forehead. The colours of control points can be made to vary with 
time, causing smooth changes of colour in selected parts of the face, 
as in a blush. 


Conclusion 

We have presented some experimental work with a face model. 
Different modelling techniques for facial animation have been 
described along with a method of using the model to produce 
sychronized animation directly from a speech synthesizer. The user 
interface to the speech and expression program is particularly simple 
and effective. 
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Figure 4. Scream. 
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Figure 5. Hi there. 
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Abstract 

Virya is an interactive graphical motion control editor for 
kinematic and dynamic animation. Most animation is con¬ 
trolled kinematically , by designating objects’ positions taken 
over time without consideration for the causes of the motion. 
An alternative is dynamic motion control, where objects are 
seen as masses moving under the influence of forces and 
torques. Dynamic motion control has some advantages in that 
motion more naturally simulates real world conditions and 
many complex motions can be automatically calculated, 
though calculating motion is quite expensive and control is 
sometimes less intuitive. The editor Virya works both for 
kinematic and dynamic motion control. It has two main tasks: 
to specify control functions representing positions (kinematic) 
or forces and torques (dynamic) controlling motion, and to 
specify control modes which designate how control functions 
are interpreted or whether joints are frozen in place, relaxed, or 
balanced. Using these control modes, the user can designate 
motion using a convenient kinematic method and still use 
dynamic analysis as a final step to constrain and add realism. 

KEYWORDS : computer animation, human modeling, dynam¬ 
ics, simulation 


1. Kinematic Motion Control Methods 

Motion control is a central problem in computer anima¬ 
tion and is the one aspect of animation that most sets it off 
from other areas of computer graphics. Common kinematic 
approaches to motion control ait 3-D keyframing, motion con¬ 
trol functions, parametric control , and animation languages 
(these approaches can, and often are, combined). 

In 3-D keyframing, the user typically positions the objects 
in the scene interactively, designating a sequence of 
configurations and the times when they should occur [10,11] 
The animation system then interpolates between these key- 
frame configurations to generate the inbetween configurations. 
3-D keyframing is inherently superior to 2-D keyframing for 
3-D animation because the problems of information loss do not 
occur. However, keyframing is limited by the necessity of 
creating many keyframes and the lack of complete control over 
the interpolation process defining the path and speed of motion 
between keyframes. 
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Parametric motion control involves designating certain 
parameters whose values define a particular configuration of 
the objects in the world [5]. For example, in the case of facial 
animation, parameters may designate the position of the 
mouth, the elevation of the eyebrows, etc. [8]. Parameters are 
convenient to use and allow association of reasonably complex 
motions (such as smiling) with one or a few parameters. 
Choosing parameters that cover the desired range of motion 
can be problematic, so the user may have to sacrifice complete 
motion control for ease of use. 

Animation languages are an attractive alternative because 
complex motion can be described in the form of scripts [9]. 
Some languages are fairly low-level and merely provide a con¬ 
venient interface to specify simple motions [7]. Higher-level 
languages would allow the user to specify motion in general 
terms (e.g. "walk forward") and depend upon an intelligent 
hierarchical interpretation system to find the specific low-level 
directions needed to draw the frames [16]. While high-level 
languages may provide the most convenience to the user in the 
long run, at present many issues involved in high-level motion 
control remain unresolved. Again, use of a script may limit the 
amount of control the user has over the motion. 

Kinematic motion control functions represent motion at 
each degree of freedom in the form of position versus time 
curves. The control functions can be simply generated and 
succinctly stored using control points which generate the 
curve. These control functions are low-level and represent 
motion at individual degrees of freedom, but do allow very 
detailed specification of motion. An advantageous feature of 
control functions is that the final motion description of the 
other methods (changes to particular degrees of freedom over 
time) can be easily represented in this form. The use of an 
interactive control function editor allows the user to make indi¬ 
vidual changes to motion at the lowest-level and at the last 
minute, and can make up for some of the loss of exact control 
often concomitant with the above methods. 

2. Dynamic Animation 

Most animation systems at present are kinematically- 
based, that is, motion is considered as the relation of position 
versus time without consideration of the environmental 
influences causing the motion. Virya, the motion control editor 
described here, was designed mainly for use with the dynamic 
animation system Deva. In Dev a, objects are considered as 
extended masses which act under the influence of forces and 
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torques. Using dynamic analysis, this relationship is formu¬ 
lated as the dynamics equations of motion; the solution of 
these equations is a kinematic description of the motion that 
would occur under the specified conditions in a "real" (simu¬ 
lated) world [13,14]. 

Dynamic animation has certain advantages lacking in 
kinematic systems. Motion is automatically constrained to 
respond to environmental conditions. For example, it is 
kinematically difficult to animate a body, such as a human 
figure, naturally responding to collisions, such as hitting the 
ground. One problem is that colliding objects should not move 
through each other. The user can avoid this, at considerable 
expense, by visually checking for unrealistic intersections, 
doing automatic collision detection, or implementing con¬ 
straints using inverse kinematics [4]. Another problem is that 
when an articulated body collides, the motion of its many con¬ 
nected segments can be extremely complex and difficult to 
predict. Using dynamics, such collisions can be automatically 
simulated by applying an opposing force against the body 
when it collides. Taking this force into account during 
dynamic analysis results in a natural response to the collision, 
including a certain amount of bouncing and correct reaction of 
other connected body parts. Bodies will also automatically 
respond to gravity or the motion of connected body parts or 
external influences. 

Dynamic animation does has some disadvantages as well. 
It is computationally much more expensive than kinematic ani¬ 
mation [1,14]. When the system dynamics are complex, as in 
the case of articulated bodies with many degrees of freedom, 
numerical instability can be a problem. Dynamics also requires 
initial determination of object and environmental characteris¬ 
tics such as masses, joint limits, scale factors for springs and 
dampers used in collisions, etc. Perhaps the most serious 
disadvantage occurs in simulating controlled motion. While 
bodies respond naturally to environmental conditions, it is 
difficult to find when, where, and how strongly to apply inter¬ 
nal forces and torques simulating muscles in humans and other 
animals. These issues are explored in detail elsewhere 
[13,14,15]. Suffice it to say that while dynamic animation is 
still in its exploratory stages, it does offer a method with con¬ 
siderable potential for simulating realistic motion. 

3. Virya Motion Control Functions and Modes 

Virya is a motion control editor used to develop, modify, 
and store motion control information in the form of control 
functions and control modes. Virya was designed to work 
within the dynamic animation system, Deva. Some of its 
features are only relevant to dynamic animation systems; other 
features are relevant to both kinematic and dynamic systems. 
Deva and Virya were specifically designed to explore the prob¬ 
lems of controlling the motion of articulated bodies such as 
animals and robots. Though the same principles apply to con¬ 
trolling other, simpler objects, discussions will largely be from 
the standpoint of controlling articulated bodies. 

Virya control functions can either be kinematic, represent¬ 
ing positions over time for each degree of freedom, or 
dynamic, representing forces (sliding joints) or torques (revo¬ 
lute joints) for each degree of freedom. In Virya, control func¬ 
tions are represented by cubic interpolator spline curves, 
piecewise curves that interpolate a sequence of user-defined 
control points with first and second derivative continuity [2,3]. 


Cubic interpolator splines have the advantage that they inter¬ 
polate specified control points; they have the disadvantages of 
being global (changes in one control point affect to varing 
degrees the entire curve) and given to occasional wild 
behavior. A local interpolator spline, such as that of 
Kochanek [6], or a local approximating spline that can 
approach control points arbitrarily closely such as the beta- 
spline [3], may be more desirable. Control functions are easily 
constructed in Virya by picking control points on the screen 
using a puck and tablet. 

Each degree of freedom of the body (e.g., flexion of the 
elbow or rotation of the head) can exist in one of five control 
modes for dynamic animation: relaxed , dynamic control , 
frozen, balanced, and hybrid K-D modes. Each degree of free-" 
dom can alternate between modes during the animation. One of 
Virya' s functions is to specify modes and their durations for 
each degree of freedom. A sixth, pure kinematic mode exists 
which completely by-passes dynamic analysis. In this case, 
control functions represent positions over time and are directly 
sampled to produce purely kinematic animation. 

3.1. Relaxed Mode 

Dynamic animations can be developed without any user- 
specified motion at all, merely by placing the body in an 
unstable position and letting it react to the gravitational force, 
its own joint limits, the ground, etc. The degrees of freedom in 
relaxed mode will move freely under environmental conditions 
with no internal controlling force or torque simulating a muscle 
contraction or a robot actuator motor. These relaxed degrees 
of freedom are constrained to remain within their joint limits 
and their motion is slightly damped. Other forces or torques 
due to collisions or motion of other body parts will still act 
upon them. (Within the system Deva, springs and dampers are 
used to mimic both joint limits and collisions.) 


3.2. Dynamic Mode 


To actually control the animation, pseudo-muscular 
forces or torques must be applied to certain degrees of freedom 
of the body. For example, to make the body wave its arm, 
torques must be applied to the shoulder and elbow. The most 
direct way to specify these forces and torques is to develop a 
control function for each degree of freedom whose motion is 
controlled in this way. These control functions represent a 
force (for sliding degrees of freedom) or a torque (for revolute 
degrees of freedom) over time. The control functions are sam¬ 
pled to find the appropriate controlling force or torque to apply 
to specified degrees of freedom at each time instant that 
dynamic analysis is done; these controlling forces and torques 
are then added to the automatically calculated forces and 
torques mentioned above to find the total forces and torques 
acting upon the body. 


While these force/torque control functions have the 
advantage of directly specifying the controlling forces and 
torques needed for dynamic analysis, they leave the user at the 
disadvantage of not knowing intuitively what forces or torques 
will be necessary to produce the desired motion. This problem 
is accentuated by the complex interactions between different 
parts of an articulated body. For example, should the user find 
the correct force to lift the arm rigidly at the shoulder, and then 
add torques to die elbow to accompany the lifting by a bending 
ction, he will find that the additional torque at the elbow inter- 
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feres with the smooth motion of the shoulder. For this reason, 
other modes have been added for user ease. 

3.3. Freeze Mode 

It is common during many movements of articulated 
bodies that some parts of the body remain locally stable. For 
example, in reaching movements the legs and hips may remain 
stable, and during walking the head usually is directed forward. 
Because of the complex interplay mentioned above, it is 
difficult to find the sequence of forces or torques that will 
ensure local stability. A simple solution to this is to simulate 
the stabilizing torque (or force) with a tight spring and damper 
clamped about the local position. This can be viewed as a tem¬ 
porary change in the range of the joint end limits. 

3.4. Balance Mode 

While automatic response to gravitational forces make 
certain motions easy to simulate, this can create problems with 
coordinated movements. For example, walking no longer 
becomes a question of merely manipulating the legs in the 
proper sequence, but also of balancing the upper part of the 
body to keep it from falling over. A simple way to achieve 
balance is to describe a world-space orientation vector for par¬ 
ticular segments, such as the trunk, and apply an external 
applied force to counteract any motion away from this orienta¬ 
tion. Experimentation shows this technique is acceptable in 
limited cases; whether it will provide realistic balance in all 
cases has not been explored in depth. 

3.5. Hybrid K-D Mode 

The freeze and balance modes do help with some of the 
motion control problems introduced by dynamics, but leave the 
major problem of determining forces or torques for controlled 
motion, rather than just stability. An approach to this problem 
that has been reasonably successful is to describe the desired 
motion in kinematic terms, as kinematic control functions 
specifying rotation (for revolute joints) or translation (for slid¬ 
ing joints) over time. The exact same Virya interactive inter¬ 
face can be used; the interpretation of the control functions 
merely changes. 

These kinematic descriptions are then used to find the 
forces and torques applied at specified degrees of freedom. 
The method used to find these forces and torques is trivially 
simple, but surprisingly effective. The equations used are 

delp = des_pos -pos 
delv - delp Ideltime - vel 
ft = delv * m Ideltime 

For sliding joints, delp is the difference between the desired 
position at the next time sample ( des_pos) and the present 
position (pos). delv is delp divided by the time between sam¬ 
ples (deltime) minus the present velocity (vel ), in other words, 
the amount the velocity must be altered to achieve the desired 
position at the next time sample, ft is the estimated force that 
must be applied to achieve this, and m is the mass of the seg¬ 
ment distal to this degree of freedom. For revolute joints, the 
same formula is used but the velocities are angular, ft is a 
torque, and m is a moment of inertia. Probably because 


dynamic analysis is done from 3 to 30 times as frequently as 
images are displayed, no feedback is needed to achieve smooth 
motion. 

The advantage to this method is that the user can enter 
motion directions in the intuitive kinematic form for those 
joints whose particular motion is known (or desired) and yet 
retain the advantage of dynamics. 

3.6. Pure Kinematic Mode 

Kinematic mode is separate from the dynamic animation 
package. Deva can operate as a strictly kinematic animation 
system, in which case the control functions specified by Virya 
are taken to represent the actual desired motion for all degrees 
of freedom. The dynamic analysis routines are entirely by¬ 
passed and fast kinematic animation is possible. 

4. Virya User Interface 

The Virya screen (see Figure 1) consists of three regions. 
The lower half of the screen contains small joint windows 
representing each degree of freedom of the system. The upper 
right quadrant consists of a menu of commands for designing 
and saving control functions. The upper left quadrant is a large 
window where control functions can be viewed and altered. 
The user selects menu items or designates points on the screen 
by using a graphics tablet and puck. Virya runs on an Evans 
and Sutherland PS300/340 graphics system. 

Each joint window contains a label identifying the degree 
of freedom represented there; e.g. "J4 elbow (1) rz" refers to a 
z-rotation of the elbow (joint number 1). The line through 
each window joins the control points that define the curve 
representing motion control information for that degree of free¬ 
dom. The user can alter these control points and thus define 
the shape of the motion control function. By specifying more 

or fewer points and spacing them differently, such control 
functions can be used to control both the path of the motion 
and its velocity (kinematics) or the strength of the force or 
torque applied (dynamics). 

The menu consists of I/O commands, joint commands, 
vertex commands, and miscellany. The I/O commands are 
LOADBODY (input a body description including segments, 
joints, degrees of freedom, and present configuration); LOAD 
(input a previously created Virya file containing motion control 
information); and SAVE (save the present motion control 
description in a file). Joint commands are SHOWJ (bring a par¬ 
ticular control function into the large window); COPYJ (copy 
the control function for one degree of freedom to another); 
ACTIVEJ (designate one control function as modifiable); 
REDRAWJ (redraw the active control function); and CURVEJ 
(use the control points to create a cubic interpolatory spline 
curve). The vertex commands all refer to the degree of free¬ 
dom that has been designated "active". They are INITV, 
ADDV, DELV, MOVEV, and LOCV. INITV initializes the 
control function to a horizontal line along the time axis. 
ADDV, DELV, and MOVEV add, delete and move control 
points defining the control curve. LOCV gives the exact 
numeric value of a point on the screen. The miscellaneous 
commands are CONFIRM (confirm a change); QUIT (leave 
Virya); and TABLET (a binary switch between inputting con¬ 
trol points from the terminal or from the graphics tablet). 
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The DKFRB ( dynamic / kinematic I frozen / relaxed / bal¬ 
anced) option allows the user to designate control modes for 
each degree of freedom and a time span during which the 
modes are in force. The default control mode for dynamics is 
dynamic mode. 

Virya motion descriptions are stored in an ascii file 
(which under present conditions makes up for space consump¬ 
tion with user convenience); part of a sample file is shown in 
Figure 3. 

5. Use of Virya with 3-D Keyframing 

Because motion of a complex articulated body such as a 
human figure is sometimes difficult to visualize in terms of 
angular or translational motion of individual degrees of free¬ 
dom, it has been found convenient to initially define the motion 
as a series of keyframes developed by interactively positioning 
the body on the screen using Deva. A sequence of keyframes 
with the times they should occur are stored in an ascii file (part 
of such a file is shown in Figure 2). These files can be con¬ 
verted to Virya control function files and used to generate 
control curves which are sampled to drive the animation or 
called into the Virya editor for modification. 

These key-frame-derived control functions initially place 
all degrees of freedom in hybrid K-D mode, but because this 
almost completely constrains the motion, there would be little 
point in doing dynamics. Typically the user modifies this file 
placing many degrees of freedom into relaxed, frozen, or bal¬ 
anced modes to allow dynamics to fulfill its purpose of adding 
realism. 

6. Sample Session 

A simple session using Virya will be described to illus¬ 
trate its use. First, the user enters the animation system Deva 
and calls up a previously stored figure, in this case the 24- 
degree-of-freedom human figure Joe. Using dials and the key¬ 
board, five keyframe configurations for Joe are found. These 
configurations are to occur at 0, 3, 6, 7, and 8 seconds in the 
animation. They are stored in ascii form in a keyframe file 
shown in part in Figure 2. The lines with a single number 
represent the times when the keyframes occur, and the follow¬ 
ing numbers are positions for each degree of freedom of 
motion at that time. 

The keyframe file is converted using Deva to the format 
needed for interaction with Virya. In the keyframe file, posi¬ 
tions are grouped by the times when they occur; in the Virya 
file they are grouped by degree of freedom. Figure 3 shows 
part of the Virya file originally derived from the keyframe file, 
but somewhat modified using Virya. Following the control 
positions (taken from the keyframes) for each degree of free¬ 
dom are definitions of the states, or modes, over time. Initially, 
when the file is created from a keyframe file, all degrees of 
freedom are in the hybrid K-D mode (K) during the default 
time span (0-100 seconds). (The 4 numbers after the time span 
are only used in balance mode, where they represent the posi¬ 
tion vector and the amount of deviation from it allowed.) Dur¬ 
ing the motion described (lifting the legs and swinging the 
arms), many degrees of freedom (such as the waist) are frozen 
into their local configuration and others (such as the right knee) 
are relaxed. The modes of these joints are changed from the 
default K-D mode. 
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This Virya file was used to drive dynamic analysis and the 
resultant, predicted motion stored in another, similar Virya file. 
This second, output file is a kinematic description of the 
dynamically predicted motion, and was sampled to produce the 
animation shown in Figure 4. Keyframes are approximately in 
locations (0,0), (1,3), (3,0), (3,2), and (3,5) (row major order). 
(Actual dynamic analysis was done 300 times per second; not 
•all of these configurations are stored for the kinematic descrip¬ 
tion and display.) 

7. Conclusions 

The interactive graphical editor Virya is used to design 
and store motion control commands for kinematic or dynamic 
animation using control functions and control modes. For 
kinematic animation, the user designs control functions 
representing positions over time for each degree of freedom. 
For dynamic animation, control functions may represent either 
kinematic information (positions over time) or dynamic infor¬ 
mation (forces or torques over time). To alleviate some of the 
control problems that accompany the advantages of dynamic 
animation, the freeze, balance, and relaxed control modes are 
also available. The kinematic output of dynamic analysis rou¬ 
tines and motion control information derived from other 
higher-level control methods can be stored the form of Virya 
data files. This format is convenient for low-level modification 
and sampling for animation generation. 
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Figure 1. The Vitya Screen 
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Figure 2. Partial Keyframe File 


Figure 3. Partial Vitya File 


0.000000 

180.000000 180.000000 180.0000000.000000 
0.000000 0.000000 0.000000 0.000000 
0.000000 0.000000 -180.000000 0.000000 
0.000000 0.000000 -180.000000 0.000000 
0.000000 0.000000 0.000000 0.000000 
0.000000 0.000000 0.000000 0.000000 

3.000000 

180.000000 180.000000 180.0000000.000000 
0.000000 0.000000 0.000000 0.000000 
0.000000 0.000000 -150.000000 0.000000 
30.000000 0.000000 -210.000000 0.000000 
25.000000 0.000000 30.000000 0.000000 
0.000000 0.000000 0.000000 0.000000 

6.000000 

180.000000 180.000000 180.0000000.000000 
0.000000 0.000000 0.000000 0.000000 
0.000000 0.000000 - 210.000000 0.000000 
30.000000 0.000000 -180.000000 0.000000 
25.000000 0.000000 0.000000 0.000000 
0.000000 0.000000 0.000000 0.000000 
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dof 18 Pjnt 7 Type 0 Numv 5 j_hipr 
Control 0.000000 0.000000 
Control 3.000000 0.523599 
Control 6.000000 0.000000 
Control 7.000000 0.000000 
Control 8.000000 0.000000 
StatesK0 6000 0 
States F 6 8 0 0 0 0 

Q 

dof 19 Pjnt 7 Type 1 Numv 5 j_hipr 
Control 0.000000 0.000000 
Control 3.000000 0.000000 
Control 6.000000 0.000000 
Control 7.000000 0.000000 
Control 8.000000 0.000000 
States F 0 8 0 0 0 0 
Q 

dof 20 Pjnt 8 Type 0 Numv 5 j kneer 
Control 0.000000 0.000000 
Control 3.000000 0.000000 
Control 6.000000 0.000000 
Control 7.000000 0.000000 
Control 8.000000 0.000000 
States R 0 8 0 0 0 0 
Q_ 
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ABSTRACT 

The animation of human figures is one of the major 
problems in computer animation. A recent approach to this 
problem is the use of dynamic analysis to compute the move¬ 
ment of a human figure given the forces and torques operating 
on the body. One of the main problems with this technique is 
computing the forces and torques required for particular 
motions. As a solution to this problem an interactive interface 
to our dynamics routines has been produced. This interface, 
along with a collection of low level motion processes, can be 
used to control the motion of a human figure model. In this 
paper both the user interface to our dynamics routines and the 
motion processes that we use are described. 


KEYWORDS: human figure animation, dynamic analysis, 
interactive control of human figures 


1. Introduction 

One of the more challenging parts of computer animation 
is the animation of human figures and other articulated bodies 
(for example, robots and animals). Over the past decade a 
number of techniques have been developed for the animation of 
human figures. These techniques vary from digitizing human 
movement, to the development of kinematic models of human 
motion. Recently the dynamic analysis of human motion has 
been proposed as a way of animating human figures 
[Armstrong and Green 1985a, Wilhelms and Barsky 1985]. 

Dynamic analysis has a number of advantages over other 
approaches to human animation. Since this technique is based 
on well known techniques from physics and robotics, it is capa¬ 
ble of producing very realistic motion. The motion of the 
human figure is controlled by forces and torques that are 
applied to the limbs of the body. In most motions only a small 
number of the limbs are actively involved, these limbs are called 
the controlled limbs. The other limbs in the body either main¬ 
tain the same relative position, or follow the motion of the 
controlled limbs. The latter motion can be automatically pro¬ 
duced by the dynamics software, therefore, the animator only 
needs to specify motion information for the controlled limbs. 
The small volume of information required to produce motion 
could lead to human animation systems that are much easier to 
use than existing systems. 

There are two main problems associated with the use of 
dynamic analysis for human animation. The first problem is 
the amount of computer time required to compute the motion 
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of the human figure. Traditional approaches to the computa¬ 
tion of human motion require overnight batch runs for simple 
animation sequences [Wilhelms 1985]. We have developed an 
approach to the solution of the equations of motion that is sig¬ 
nificantly faster than other techniques. This approach can pro¬ 
duce near-real-time animation on commonly available 
hardware. Some of our results in this area are described in sec¬ 
tion 2. 

The second problem is determining the torques and forces 
required to produce a particular motion sequence. Animators 
work in terms of body positions and complex motions, such as 
walking and running. They have no experience with the 
torques and forces required to produce the motion they want. 
There are two parts to our solution to this problem. The first 
part is the development of a number of low level motion 
processes. These processes generate the torques and forces 
required to produce particular types of motion. The second 
part of the solution is an interactive user interface that allows 
the animator to specify values for the parameters used by the 
motion processes, or directly apply torques and forces to the 
body while it is in motion. The animator can obtain immediate 
feedback on the effects of changes in parameter values, or the 
effects of torques and forces. Through this interface the ani¬ 
mator is able to experiment with different ways of producing 
motion, and develop a feel for how they can be used to pro¬ 
duce the motion he or she wants. There is also the possibility 
of producing canned motions that can be called upon by the 
animator. These motions could be parameterized so they can 
be customized to a particular situation. The work we have 
done in this area is discussed in sections 3 and 4. 

2. Near-Real-Time Dynamics 

One of the main drawbacks to using dynamic analysis for 
human animation has been the amount of computing required. 
Some of the formulations of the equations of motion for 
human figures and techniques for their solution are based on 
the techniques developed in mechanical engineering for the 
analysis of general linkages such as those found in machines. 
The linkage structure of the human body is not as complicated 
as the systems studied in mechanical engineering, where any of 
the links in the mechanism could directly effect the motion of 
any of the other links. This can give rise to a graph structure 
for the links. On the other hand, the human body can be 
viewed as a tree of links with no interconnections between the 
leaves on different branches. This observation significantly 
simplifies the equations of motion and allows for efficient solu¬ 
tion techniques. This version of the equations of motion and 
techniques for their solution have been described elsewhere 
[Armstrong and Green 1985a,b]. At this point we will sum¬ 
marize the results of this work. 
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Two implementations of our solution of the equations of 
motion have been produced. The first implementation is 
designed to run on a single processor. 

written in C and currently runs on a DEC VAX 11/780, SUN 
workstation, and IRIS workstation. The time required to com¬ 
pute a motion sequence depends upon the inertias used for the 
body parts. The step size required for a stable solution of the 
equations of motion is proportional to the square root of the 
values of the inertias. In our current implementation the iner¬ 
tias are about a factor of 10 larger than those found in a 
human body. This results in a computation time that is within 
a factor of 3 to 10 of real-time depending on the complexity ol 
the figure and the complexity of the motion. The animation 
produced by this implementation is fast enough to get a good 
feel for the motion while the program is running The other 
implementation is designed to run on a network of processors 
[Armstrong et.al. 1986]. Since the human body can be viewed 
as a tree of limbs, subsets of limbs of the tree can be assigned 
to different processors, and a large amount of the computation 
can proceed in parallel. The results we have obtained so far 
indicate that real-time animation could be produced by a net¬ 
work of four SUN 3 workstations. 


3.1. Limb Motion Processes 

The human figure model consists of a number of links, 
with each link representing a part of the body. cl P!" t 
model contains 14 links (head, neck, upper body lower body , 
upper arm, lower arm. upper leg, lower leg, and foot). At the 
proximal end of each link there is a three-degree-of-freedom 
rotational joint connecting it to its parent. The upper body 
link has three extra degrees of freedom representing the transla¬ 
tion of the body with respect to the world coordinate system. 
The current state of the model is given by the position of the 
upper body and the three rotation angles at each joint. 

There are nine parameters that can be used to control the 
motion of each link. These parameters are the components ot 
the the internal torques (torques generated at the joints), 
external torques (torques applied from outside of the body), 
and external forces (forces applied from outside of the 
Only the internal torques are used by the limb motion 
processes. At any point in time there can be one or more 
motion processes associated with each limb. Each ofthese 
processes contributes to the internal torque that is applied to 




These results indicate that it is possible to construct a sys¬ 
tem where the animator can manipulate the dynamics of a 
human figure in real-time. 

3. Control Strategies 

Most human motion involves only a subset of the limbs in 
the body. When the animator develops these motions he or she 
will want to work with a small number of limbs at any point m 
time. Controlling more than two or three limbs in real-time is 
probably beyond the capabilities of most people. Thus, the 
animation system should allow the animator to build up his or 
her motion sequence on a limb-at-a-time basis. The mam 
problem with this approach is that when one limb moves, the 
other limbs it is connected to are subjected to forces and 
torques as a result of its motion. This is a natural result of 
Newton’s laws of motion. Some of these secondary motions 
may be desirable. For example, when the upper arm moves the 
animator will want the lower arm and all the limbs attached to 
it to also move. Other secondary motions are not desirable. A 
good example of this is when the figure reaches for an object 
only the arm should move and not the body as a whole. 

The solution to this problem can be based on the use of a 
number of motion processes, which when added to the body 
model guarantee reasonable behavior. These motion processes 
are similar in function to the finite state automata used by 
Zeltzer [Zeltzer 1982]. A number of features of human motion 
can be handled by a collection of motion processes. Some of 
these features are: maintaining the same relative position 
between two connected limbs, balance, ground reaction, and the 
performance of simple motions, such as reaches. The motion 
processes can be divided into two basic categories; called limb 
processes and global processes. A limb process is responsible 
for the motion of only one limb. Each limb can have one or 
more motion processes associated with it at any one time. The 
number and types of motion processes on each limb are under 
the control of the animator. The global motion processes 
affect more than one limb. These processes are responsible for 
motions that require global knowledge of the state of the body. 
Examples of this type of process are balance and ground reac¬ 
tion. 


The motion processes described in the following sections 
were motivated by the work that has been done in biomechanics 
(see [McMahon 1984a] for a good introduction to some of the 
relevant work). We have used biomechanics as a source of 
ideas for controlling the motion of the body, we are not trying 
to accurately model the human nervous or muscle systems. 


The motion processes are controlled by a joint informa¬ 
tion table. This table contains one entry for each degree of 
freedom in each joint. The table entry contains the state of the 
degree of freedom and parameter values that are required by 
the associated motion processes. The state is a bit vector that 
specifies the motion processes that are currently associated with 
that degree of freedom. There is one bit in this vector for each 
motion processes. If the bit is set, the motion process can 
offthat decree of freedom. 


The motion processes are executed on each iteration of the 
dynamics calculations. Each degree of freedom in each limb is 
considered separately. At start of the processing for a degree 
of freedom, its internal torque is set to zero. Then the state hit 
vector is examined to determine the motion processes that are 
to be executed. Each motion process uses the current state o 
the link, plus its own parameters (stored in the joint informa¬ 
tion table) to compute a contribution to the internal torque. 
At the end of this process, new internal torques have been gen¬ 
erated for each limb in the model. 


At the present time we are using four motion processes. 
The first process, called free swing, is a null process that does 
not contribute to the internal torque of the joint. This process 
allows the joint to move freely (without any constraints) in the 
degree of freedom it is attached to. 

The second motion process, called friction, generates a 
velocity dependent friction which is used to slow the limb down 
when it is in motion. The friction in a joint is proportional to 
the relative angular velocity of the limb with respect to its 
parent. The constant of proportionality can be interactively 
controlled by the animator. This model of friction agrees with 
results from biomechanics [McMahon 1984a]. 

The third motion process, called the maintain process, is 
used to maintain the relative angular positions between two 
adjacent limbs. When producing a motion, the animator may 
want only a subset of the limbs to move. The other limbs in 
the body should stay in the same relative positions. This 
motion process is used to achieve this goal. When using this 
motion process the animator specifies a parameter, called 
center, which is the desired angle between the limb and its 
parent. The motion process maintains the angle between the 
two limbs close to the value of center. The angle between 
limbs cannot be clamped to the center value for two reasons. 
First, this behavior is not realistic from a biological point of 
view. Second, fixing an angle adds a constraint to the dynam- 


Qraphies Interface ’86 Vision Interface v 86 



149 


ics equations and would require reformulating the solution. 
The following function, T(x), is used to determine the torque 
applied to a joint given the angle, x, at the joint, and the 
desired angle, center. 

T(x) = a (exp(/J(x-center)) - 1) if x > center (1) 

= -a (exp(£(center-x)) - 1), otherwise 

The parameters a and /?, which can be set by the animator, 
determine the strength of the torque applied at the joint. Ini¬ 
tial values are supplied for center, a, and & that will maintain 
the human figure in a standing position. The form of this 
motion process is based on results from studies on muscle 
dynamics [McMahon 1984a] [Hatze 1977]. 

The fourth motion process, called the simple move pro¬ 
cess, is used to move a limb from one position to another. The 
animator specifies the new angle between the limb and its 
parent, and this motion process produces a smooth motion 
between the current limb position and the new limb position. 
When the new limb position is reached the maintain process is 
invoked to maintain the new position. In order to move the 
limb from its old position to its new position a sequence of 
torques (one for each time step of the dynamics calculations) 
must be applied to the joint. This sequence of torques must 
satisfy two conditions. The first condition is that the torques 
must be strong enough to move the limb to its new position. 
The second condition is that the generated motion must appear 
to be natural. 

In order to satisfy the first condition we use expression 
(1) to estimate the amount of torque required to reach the new 
position (the new position is used as the value of center). If 
this torque was applied to the joint, the limb would reach its 
new position within one or two iterations of the calculations. 
Time steps on the order of 0.01 seconds are usually used in the 
dynamics calculations, therefore, the limb would move from its 
original position to its new position in a small fraction of a 
second. For most types of motion this change of position is 
far too rapid. The torques produced by expression (1) are 
unrealistic, but they do guarantee that the limb will reach the 
new position, thus it is a good starting point for our calcula¬ 
tions. 

In order to produce more realistic motion we place two 
constraints on the torque produced by expression (1). The first 
constraint is that the torque cannot exceed a certain maximum 
value (these values can be found in tables of maximum torques 
for physical activities [Plagenhoef 1971]). The second con¬ 
straint is that the rate of change of torque cannot exceed a 
maximum value (reasonable values for this parameter are scat¬ 
tered in the literature). When these two constraints are applied 
to the torques produced by expression (1), smooth motion is 
produced in the first part of the action. The main problem 
with this technique is that the motion does not slow down as 
the final position is reached. Even with this problem the 
results look fairly realistic. 

The motion in the second half of the action can be 
improved by decreasing the torque applied to the limb as it 
approaches the new position. The angular mid-point of the 
motion is fairly easy to determine (the average of the initial 
and final angles). After this point the torque applied to the 
limb should be decreasing. This can be achieved by setting the 
maximum torque to the torque value at the mid-point of the 
action. At each iteration after the mid_point, the maximum 
torque is decreased by the square root of the ratio of the dis¬ 
tance from the current position to the goal, to the distance 
from the center position to the goal. That is, we use the fol¬ 
lowing expression: 


max_t = centerj ((current - goal) / (half - goal)) (2) 

where: centerj = the torque at the angular mid-point of 
the action 

current = current joint angle 
goal = final joint angle 
half = the mid-point of the action 

The above expression seems to produce a smooth motion in the 
second half of the action. 

3.2. Global Motion Processes 

The global motion processes use the states of several limbs 
in order to control the global motion of the body. These 
processes can generate torques and forces that are applied to 
several of the body’s limbs. At the present time two global 
processes are used in our software. These processes maintain 
the balance of the figure, and its reaction to the ground. 

At the present time, a very simple approach is taken to 
balancing the figure. First, the difference between the positions 
of the top and bottom of the body is calculated. A restorative 
force based on this difference is then applied to one or more 
limbs of the body, depending upon the type of motion. This 
technique can be used to keep the body in a standing position, 
but in general it is too restrictive. A more realistic balancing 
technique would be based on the limbs that are in contact with 
the ground. Torques generated by these limbs could be used to 
keep the body in balance. 

One of the main problems with balance is that different 
types of balance may be required for different types of motion. 
In the case of diving and gymnastics, a balance process may 
hinder the motion. Walking is based on falling forward 
[McMahon 1984b], so an external balance process could make it 
impossible for the figure to walk. In other motions, such as 
reaching and lifting, balance is very important, so for these 
motions a balance process must be used. This suggests that 
either a sophisticated model of balance must be developed, or it 
must be under the control of the animator. 

The human figure must be able to react to any of the 
objects it comes in contact with. The most common of these 
objects is the floor or ground. In equilibrium the floor will 
exert a force on the body equal to the body’s weight. This 
solution cannot be used in animation, since the motion of the 
body will change the force applied to the floor by the body. 
The approach that we have used is to monitor the position of 
the body’s feet (or another points of contact with the floor). 
A spring force is applied to the feet in order to keep the feet at 
the floor level. This technique works as long as the feet stay 
close to the floor. If the body is falling towards the floor, a 
more sophisticated technique is required. The friction between 
the feet and the floor must also be considered, otherwise the 
feet will slide all over the floor. A good discussion of ground 
reaction can be found in [Wilhelms 1985]. 

4. Software Architecture 

The interactive animation system that we have developed 
is divided into two main components, which are shown in fig. 
1. The first component, called the front end, is responsible for 
displaying the human figure model and interacting with the 
user. This component of the animation system resides on an 
IRIS 1400 workstation and is responsible for controlling the 
animation system. The second component is the dynamic 
analysis program. This program performs all the dynamics cal¬ 
culations for the human figure model. The second component 
can reside on the IRIS workstation, or on one or more of the 
other processors on our local network. 
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Fig. 1 Software Architecture 

The two components of the animation system communi¬ 
cate by sending packets over an interprocess communications 
facility (either a pipe, or a socket in the case of an ethernet 
connection). The front end invokes the back end process when 
the user wants to perform dynamics computations. Upon invo¬ 
cation, the dynamics program reads a start-up file containing a 
number of parameters for the computation, and performs the 
first step in the computation. At the end of the first step the 
dynamics program sends a set of packets to the front end. 
There is one packet in this set for each limb in the body, giving 
its current joint angles. There is also a packet specifying the 
current position of the root limb within the world coordinate 
system. At this point the front end program responds with one 
or more packets. These packets are used to change the state of 
a limb, pass parameters for the motion processes, or specify a 
torque or force to be applied to the body. The last packet in 
this exchange is a Next_step packet sent from the front end to 
the dynamics program. At this point the dynamics program 
starts the next calculation cycle. This packet exchange ensures 
that the front end and the dynamics program are always in 
step. 

A packet exchange need not occur at each time step of the 
computation. The time step used in the computation is of the 
order of 0.01 seconds. This time step is too fine for display 
and the types of control we are using. Currently, the packet 
exchanges occur every 0.05 seconds of simulation time. This 
rate is sufficient for both display and control. 

This division of the animation system into separate 
processes has two main advantages. First, separating the 
dynamics computations from the display and user interface 
allows the use of either the single processor or distributed ver¬ 
sions of the computations with the same user interface. In 
other words, as far as the animator is concerned interacting 
with the single processor and distributed version of the compu¬ 
tations is the same. The only noticeable difference is the speed 
of computation. This allows the animator to take advantage of 
the available computing resources without changing his mode of 
operation. Second, the use of separate processes allows several 
people to work on the project without interfering with each 



Fig. 2 Screen Layout 

other. 


The screen layout for the front end is shown in fig. 2. 
The display screen is divided into three main sections, called the 
figure display, main menu, and motion menu. The figure 
display is used for displaying a graphical representation of the 
human body. The human body is represented by a three 
dimensional polygon model, with four to six polygons defining 
each of the limbs. Each of the polygons in a limb has a dif¬ 
ferent colour facilitating the identification of the different sides 
of the figure. This type of model allows us to display the 
human figure at a rate of 14 frames per second. 

The main menu contains a collection of commands that 
can be invoked when the dynamics computations are not being 
performed. The commands on this menu are used to change 
the eye position, record the motion produced by the dynamics 
routines, start the dynamics computations, playback a motion 
sequence that has been previously computed, save a motion 
sequence on a disk file, and retrieve a previously computed 
motion sequence from disk. 

Once the dynamics computations have been started they 
can be interrupted in two ways. If the user presses one of the 
mouse buttons, the dynamics computations are suspended at the 
next packet exchange, and control is transferred to the main 
menu. The dynamics computations can be restarted by select¬ 
ing the dynamics command from the main menu. This facility 
allows the user to change the eye position, or record parts of 
the motion while the computations are in progress. 

The other way of interrupting the dynamics computations 
is to move the mouse into the motion menu area. At this point 
the cursor changes shape, and the user can select one or more 
commands from the motion menu. The motion menu contains 
the name of each joint in the body and the name of the 
parameters for the motion processes. In order to change a 
parameter value for a motion process, the user selects the joint 
and parameter name from the menu, and then selects the value 
command. At this point the user is prompted for the new 
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value for the parameter. Similarly the user can change the 
state of any of the limbs in the body. When the user is fin¬ 
ished modifying the motion processes, he or she can move the 
mouse into the figure display in order to resume the computa¬ 
tion. 

This user interface has three main advantages over batch 
dynamics computations. First, at any point in the computation 
the user can suspend the computation, and then playback (in 
real-time) the motion sequence that has been produced. This 
allows the user to terminate computations that are not produc¬ 
ing the desired motion before the end the the motion sequence. 
This saves both animator and machine time. Second, the ani¬ 
mator can interactively change the motion as it is being com¬ 
puted. This allows the animator to react to the motion of the 
human figure, and frees him from precisely timing the move¬ 
ments of the figure. Third, the near-real-time computation of 
the motion allows the animator to experiment with different 
types of control strategies. 

5. Summary 

In this paper we have reviewed some of the work that has 
been done on applying dynamic analysis to the animation of 
human figures. We have also summarized the work we have 
done on producing efficient algorithms for solving the equa¬ 
tions of motion and their implementation in both a single pro¬ 
cessor and multiple processor environments. 

The significant new material in this paper is the discussion 
of automatic motion processes for controlling the human figure 
and the interactive system that we have developed for human 
figure animation. This interactive animation system allows the 
animator to take advantage of the power and flexibility of the 
new dynamic analysis techniques. 
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ABSTRACT 

This paper describes a method for representing 
and animating three-dimensional articulate 
figures. It permits the definition of a model 
consisting of segments and joints, and the 
specification of the model's motion at a high 
level of abstraction by the use of a structured 
programming language. 


RESUME 

Cet article a pour but de decrire une m^thode de 
representation et d'animation de modules 
articules en trois dimensions. L'article 
propose d’une part, la definition d'un modele de 
segments et d'articulations et d'autre part, la 
precision du movement de celui-ci a un niveau 
eieve" d* abstract ion en utilisant un langage 
structure de programmation. 

KEYWORDS: Computer animation, figure modelling, 
movement representation 


1.0 INTRODUCTION 

Many advances have been made in computer 
animation in the last few years, especially in 
the area of figure modelling and motion 
specification. Several methods have been 
proposed including the modelling and control of 
figures using procedures (procedural modelling) 
[7], the control of a physical model by the 
application of forces (dynamic modelling) [l], 
the use of goal-directed systems for the 
generation of a model's motion [5,10], and the 
use of key frame animation [3], one of the 
oldest animation techniques still in use. 

A different approach for the modelling of a 
three-dimensional articulate figure and the 
subsequent control of its motions will be 
presented here. It permits a user to define a 
model (representing a real three-dimensional 
figure) consisting of segments and joints [11], 
and to specify the desired motion of its joints 
using a high-level structured programming 
language. The problem of figure modelling and 
motion specification is dealt with in terms of 
kinematics: the study of position (displacement) 
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and its time derivatives (velocity and 
acceleration). Considerations of force and mass 
(dynamics) [4,9], balance [8], and obstacle 
avoidance [6] are beyond the scope of this 
discussion. 


2.0 DESCRIPTION OF MODELS 

Before an attempt is made to specify a 
desired motion for a model, a method for 

specifying the model must be available. The 

model's individual rigid links (segments) are 
unspecified in this study. It is assumed, 

however, that the model's links can be defined 
as graphical objects, using a high-level 
graphics language, before the model is 

constructed. 


2.1 DESCRIPTION OF JOINTS 

A joint has up to three degrees of freedom, 
that is, it can be rotated about each of the X, 
Y, and Z axes. Joints may be restricted to one 
or two degrees of freedom by permitting the 
joint to rotate about only one or two of the 
axes. Thus, simple joints such as fingers 
(hinge joints), and complex joints, such as 
shoulders (ball-and-socket joints) can be 
simulated. A joint connects only two links. A 
joint can move independently of all other 
joints, hence the position of one joint does not 
affect the motion of another. The links are 
restricted in their movements about a joint. 
During a single joint's movement, one link (the 
primary link), is considered stationary and the 
second link (the secondary link) moves with 
respect to the stationary link. A single link 
can function as both a primary link and a 
secondary link if it belongs to two or more 
different joints. One link, the model's main 
link, is singled out from the others. All 
movement ultimately refers to the main link. 
Only one link may be designated as the main link 
and it must be the primary link in all joints it 
belongs to. 

Each instance of a joint is assigned a 
unique identifier to permit subsequent motion 
specifications. The user may place restrictions 
on the range of angles through which a link may 
travel and may specify where the two links are 
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to be joined. A typical statement creating an 
articulate joint is 

JOINT joint__identif ier, 

primary_link, relative_location_l, 
secondary_link, relative_location_2, 
x_extrernes, y_extrernes, z_extrernes 

where 'joint_identifier' is the unique identifier 
for this instance of a joint; ’primary_link' and 
’secondary_link' may be either previously defined 
graphical objects (which have been defined in 
independent coordinate systems) or submodels. 

'primary__link' is the stationary link with which 
'secondary_link' moves. 1 relative_location_l' 

and 'relative_location_2* are vectors which 
contain the relative positions, to each of the 
graphical object's coordinate system, where the 
joint is to be attached. The extreme parameters 
are component vectors which store a pair of 
extreme angles beyond which the joint can not 
move. Extremes, as well as the joint's current 
angles, are given relative to the joint's 
predefined neutral position of (0°, 0°, 0°). 


22 INTERNAL REPRESENTATION 

The model of an articulate figure is 
described by a tree structure of nodes and arcs. 
Links are represented by nodes and the joints 
are represented by arcs. Each segment is 

defined on its own local coordinate system 
[2,5,10]. The nodes of each level move with 
respect to the nodes of the higher level and are 
considered stationary by the nodes below. The 
nodes at the leaves of the tree represent the 
outermost extremities of the model; the root 
node is considered the main link. Figure la is 
an example of a partial model of a human figure. 
Figure lb contains the model's tree structure. 
If a root node is no longer required to act as 
the main link, a new node can be assigned to 
that role by the statement 

FIGURE model_name MAIN major_link_in_body 

It is possible, therefore, to animate a model 
with a different main link in different scenes. 




<- Hand 


Figure lb 

a) Partial model of a human figure 

b) Model's tree structure 

Each node is associated with at least one 
arc, as in the case of the extremities which are 
secondary links. Typically, there are two arcs 
associated with each node, corresponding to a 
link (such as the femur of a leg) which acts as 
both a primary and secondary link. However, a 
node may have three or more arcs associated with 
it, as with the hand, which has six arcs (one 
representing the wrist joint, the other five 
representing the finger joints). A node may 
have at most n arcs associated with it. The 
branching factor n is theoretically unlimited, 
but a factor of 10 is deemed sufficient to 
define most articulate figures. 

The tree structure permits the 
representation of open kinematic chains only. A 
kinematic chain is a linear sequence of links 
which are connected by joints. In an open 
chain, one end point is fixed and the remaining 
chain is allowed to move freely, as in Figure 

2a. In a closed chain, more than one end point 

is fixed in space, as in Figure 2b. For 

example, if two hands are joined together and 
the arms are allowed to move while keeping the 

body motionless, a closed kinematic chain is 
formed by the arms. The motion produced in such 
a chain is more complex to analyze and is beyond 
the scope of this discussion. 




Figure 2a Figure 2b 

a) Open kinematic chain 

b) Closed kinematic chain 

Each model has a table associated with it. 
The table contains information from the 
declaration of the model, in particular, the 


Graphics Interface ’86 


Vision Interface ’86 





154 


identifier, the rotational extremes associated 
with each degree of freedom, the current 
rotational angles of each degree of freedom, and 
the instantaneous velocity and acceleration of 
the joints along each degree of freedom. The 
table describes the model completely at all 
times. 


2.3 MODEL DEFINITION USING SUBMODELS 

An articulate model can be defined using 
two basic techniques. The first allows the use 
of articulate submodels. Instead of simply 
using static graphical objects for each link, 
the secondary link can consist of a grouping of 
other links and joints, which permits the 
creation of intermediate models or submodels 
that can be referenced frequently. The 
statement 

arm := JOINT 2, humerus, (1.0, 5.0, 1.0), 
forearm, (0.0, 0.5, 1.0), 
(0°, 135°), (0°, 180°) 

creates a model of an arm consisting of a 
forearm and a backarm (humerus) joined at the 
elbow. The secondary link (forearm) is a model 
itself, consisting of links and joints. The 
relative location vector for the secondary link 
is given with respect to the main link in the 
model 'forearm'. 

An advantage of this method is that 
symmetric models can be created with fewer 
statements. For example, creating a model of a 
human body first entails the creation of 
submodels for the right arm and the right leg. 
Once defined, each submodel can be duplicated 
and joined to both the right and left half of a 
human torso. A separate set of left limbs need 
not be defined. A problem arises here, however. 
The values of the joint identifiers should not 
be duplicated for the right hand and left hand 
limbs, or else the two sets of limbs will behave 
identically. Thus, the submodel joint 
identifiers must be modified before the 

submodels can be connected to the torso, to 
ensure unique identifiers. Consider the 

statement 

arm := JOINT 2, humerus, (1.0, 5.0, 1.0), 
forearm <TRANS BY 10>, 

(0.0, 0.5, 1.0), 

(0°, 135°), (0°, 180°) 

which creates a model of an arm consisting of a 
forearm and backarm (humerus) joined at the 
elbow. The secondary link (forearm), which was 
previously defined, is an articulate model whose 
joint identifiers have been mapped onto a new 
range of identifiers starting with the 
identifier 10. 


2.4 MODEL DEFINITION BY CONSTRUCT 

The submodel approach may produce temporary 
models unnecessarily. For this reason, an 
alternate technique for creating models is 
provided which assumes that none of the model's 
subcomponents is required subsequently. This 
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method creates only the resultant model with no 
intermediate articulate models. The following 
construct is an example 


MODEL arm 
JOINT 

lr 

palm, (0.0, -3.0, 1.0), 

JOINT 

2, 

little, (0.0, 1.5, 0.5), 
(-90°, 45°), (-45°, 45°) 
palm, (0.0, -3.0, 2.0), 

JOINT 

3, 

ring, (0.0, 1.5, 0.5), 

(-90°, 45°), (-45°, 45°) 

palm, (0.0, -3.0, 3.0), 

JOINT 

4, 

middle, (0.0, 1.5, 0.5), 
(-90°, 45°), (-45°, 45°) 
palm, (0.0, -3.0, 4.0), 

JOINT 

5, 

index, (0.0, 1.5, 0.5), 

(-90°, 45°), (-45°, 45°) 

palm, (0.0, —3.0, 5.0), 

JOINT 

6, 

thumb, (0.0, 1.5, 0.5), 

(-90°, 45°), (-45°, 45°) 

ulna, (0.0, —5.0, 1.0), 

JOINT 

7, 

palm, (0.0, 3.0, 1.0), 

(-90°, 0°), (-25°, 25°) 

humerus, (0.0, 5.0, 1.0), 

ENDMODEL 


ulna, (0.0, 0.5, 1.0), 

(0°, 35°), (0°, 180°) 


If the model 'hand' comprising the first five 
joints has been defined prior to the use of the 
MODEL construct, then the model definition can 
be simplified: 


MODEL arm 

JOINT 

6, ulna, (0.0, 

-5.0, 1.0), 


hand, (0.0, 

3.0, 1.0), 


i 

to 

o 

o 

O 

o 

(-25°, 25°) 

JOINT 

7, humerus, (0 

.0, 5.0, 1.0), 


ulna, (0.0, 

0.5, 1.0), 


(0°, 35°), 

o 

O 

CO 

H 

o 

O 

ENDMODEL 

DESCRIPTION 

OF MOTION 



A model can be animated by a number of 
different approaches. Conventional animation 
relies heavily on two-dimensional techniques such 
as key frames and interpolation 
("in-betweening"), while computer animation 
usually expresses position and velocity as 
functions of time. Key frames are the frames 
used to provide the information which express 
the proper effects of movement. In animation 
studios, key frames are drawn by the head 
animators, while the frames required to create 
the smooth animation are produced by the in- 
betweeners. 

One of the earlier approaches to computer 
animation consisted of having the computer 
assume the role of the in-betweeners [3]. While 
some very effective animations have been 
achieved, in-betweening is two-dimensional in 
origin and awkward to apply to three-dimensional 
figures. In-between frames are frequently 
linearly interpolated, resulting in temporal 
discontinuities and movements which only 
approximate actual trajectories, thus, deforming 
the animation. 
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Functions of time are evaluated on a frame- 
by-frame basis which involves specifying a path 
over time. This method has the advantage of 
producing motion with few temporal 
discontinuities. The description of a three- 
dimensional path as a function of time, however, 
is generally a difficult task. 


3.1 MOTION SPECIFICATION 

The proposed approach for motion 
specification treats motion somewhat differently 
than either the key frame or functional 
technique. Whereas key frame animation views a 
figure with an external perspective and the 
functional approach views a figure from a 
piecewise perspective, the proposed technique 
treats the model as a unit and views it from an 
internal perspective (i.e. from the model's 
point of view). Therefore, a model's positional 
orientation can be specified throughout time. 
Similarity exists with key frame animation since 
key frames or positional extremes are used 
throughout time, however, each of the model's 
joints is employed and dealt with on a high 
level of abstraction, thus allowing more 
flexibility in the model's animation. 

The motion for a single joint is specified 
by the joint identifier, an initial starting 
position (angle), and a set of key frames 
(movements). Each movement contains the frame 
identifier at which the movement is completed, 
the position (angle) of the joint at the frame, 
and the interpolation method used to reach this 
positional extreme. 

The position angles relate to a neutral 
position, arbitrarily defined as (0°, 0°, 0°). 
The neutral position is defined with respect to 
the primary link in the joint. The frame 
identifiers represent the number of ticks (units 
of time) which have passed during the motion 
sequence. The frame identifiers are given with 
respect to the start of the motion 
specification. The initial frame is arbitrarily 
assigned the value of zero. The interpolation 
techniques available are: linear, acceleration, 
deceleration, and a combination of both 
acceleration and deceleration. 

The parameters above are sufficient to 
specify a single joint’s movement throughout an 
animation sequence. Temporal discontinuity 
problems can arise if the interpolation method 
is not chosen carefully. The system can avoid 
these problems by determining a joint's 
instantaneous velocity and acceleration before 
the interpolation, thus making adjustments to 
the selected interpolation. This results in 
smaller temporal discontinuities. 

A complete motion definition for a model 
consists of motion specifications for all of the 
joints present in the model. The specifications 
are independent of each other, therefore, there 
may be different numbers of key frames for each 
joint. All the joint key frames must, however, 
end at the same unit of time. 


3.2 TREE TRAVERSAL 

Before a model's tree structure is 
traversed, the symbol table is updated with each 
joint's current positional angle, and its 
current velocity and acceleration. This 
information is obtained from both the motion 
specification and the Interpolation routines. 
The tree structure is completely traversed each 
time the model is displayed (i.e. once each 
frame). The traversal algorithm is a simple 
recursive post-order routine. It assembles each 
model's instance (from the extremities inwards) 
with respect to the main link in the model. The 
resulting instance is displayed by the high- 
level graphical language. Recursive routines 
are employed in the tree traversal because they 
allow storage of the the model' s primary- 
secondary link relationship in the recursion 
stack. Also, they allow the use of the same 
routines on models whose main link (root) has 
been changed. 


3.3 EXPLICIT DEFINITION 

A motion can be specified using different 
approaches. There are two different methods for 
explicitly defining a motion. The first 
technique involves the use of a construct for 
the definition of a motion. A typical example 
is 

MOTION motion_name 
JOINT joint_id 

POSITION anglejx, angle_y, angle_z 
FRAME frame_id 

POSITION angle_x, angle_y, angle_z 
INTERPOLATE interpolation_technique 

FRAME frame_id 

POSITION angle_x, angle_y, angle_z 
INTERPOLATE interpolation_technique 

ENDJOINT 

JOINT joint_id . 

FRAME ... 

ENDJOINT 

ENDMOTION 

With the construct, the exact definition of 
the motion can be specified. ' motion_name' 

names the set of joint-motion specifications. 
The joint identifiers correspond to those 
defined in the model. Each joint can have at 
most n key frames, where n is a predetermined 
value. The construct permits a structured 
approach to producing a motion specification. 
It, however, does not provide for the use of 
other control constructs. 

A less structured method is also available. 
A typical example of the definition technique is 

motion_name [joint_id, key_frame_id] := 

FRAME frame_id 

POSITION angle_x, angle_y, angle_z 
INTERPOLATE interpolation_technique 
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The approach allows direct access to the motion 
variable. It can be employed within other 
constructs and it permits the use of iteration 
to produce the motion specifications, thus 
reducing the number of statements needed and the 
amount of work needed to define the 
specifications. 


3.4 IMPLICIT DEFINITION 

Once a motion has been defined, the 
specification can be viewed as a unit (motion 
primitive), thus it can function as a building 
block for the definition of more complex 
motions. A primitive motion algebra has been 
introduced for this use. For example, to create 
a motion which enables a human model to hop, 
skip, and jump 10 times, the following statement 
can be used: 

new_walk := 10 * (hop + skip + jump) 

'new_walk' is the resulting action. ’hop', 
’skip’, and 'jump' are previously defined 
motions which allow a model to hop, skip and 
jump respectively. The constant 10 is the 
repetition factor that is applied to the actions 
'hop’, 'skip', and 'jump'. 

This technique relies heavily on the 
availability of predefined motion primitives. 
It is recognized that the explicit definition of 
such primitives can be both difficult and time 
consuming, therefore, operations have been 
introduced which allow a more convenient 
definition of the motion primitives. For 
example, if a motion primitive (walk) exists 
which allows a model to walk, it may be 
desirable to employ the action of the legs in a 
different motion. The operation STRIP has been 
introduced, which creates a partial motion 
primitive from a given motion. In the statement 

walking_legs := STRIP walk, 5, 6, 7, 8, ... 

'walking_legs' is a new motion primitive which 
when applied to a model, animates the legs. The 
values 5, 6, 7, 8, ..., are the joint 
identifiers present in the model's legs. Using 
this method, several motion primitives can be 
created that animate only portions of a given 
model. 

The operation SYNCHRONIZE has been 
introduced, which when given a set of partial 
motions, creates a new motion that animates 
portions of the model concurrently. The 
statement 

my_walk := SYNCHRONIZE walking_legs, 
swinging_arms 

defines a motion primitive (ray_walk) which 
causes the figure to walk while swinging its 
arms. This approach allows an increase in the 
number of motion primitives, but at a much 
higher level of abstraction. 

Such an approach to the creation of motion 
gives rise to a large number of operations which 
can tailor motion primitives to the needs of the 
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current animation sequence. Possible operations 
include a scaling factor which stretches or 
shrinks a motion primitive temporally, a scaling 
factor which modifies the amplitude of a 
motion's movement, and a negation which reverses 
the order of the movements present in a motion 
primitive. These operations, as well as the 
operations * and +, have the advantage that 
intimate knowledge of the model’s structure and 
motion definitions are not necessary. They 
allow the use of motion at a high level of 
abstraction. 


3.5 USE OF A MOTION DEFINITION 

Once a model is created and motions have 
been specified, a model can be animated. 

ANIMATE model_name FROM start_frame TO end_frame 
USING motion_name 

The frame identifiers are given relative to the 
start of a scene. If the motion does not span 
the entire animation sequence, the motion will 
automatically repeat until the time span is 
covered. 

Several models of a class (i.e. models of 
identical structure) may be animated by the same 
motion. Models of the same class and with 
different-sized links will react identically. 
Models with an equal number but with different 
types of joints can still be driven by these 
motion specifications, however, the resulting 
model movements may be difficult to predict. 
The models place restrictions on their 
movements. For example, if a rotation is 
employed which violates a joint's extremes, the 
joint will enforce its extreme rotation limits 
over the motion specification given. This 
allows two models of the same class, but 
different restrictions, to behave realistically 
using the same motion set. 

Once all elements for a scene have been 
defined, it can be explicitly specified with 
static and dynamic components. A typical 

statement creating a scene is: 

SCENE scene__identifier 

DISPLAY background, ... 

DISPLAY any static models 


ANIMATE model_name FROM .... 
ANIMATE any dynamic models 


movements which vary with time 
(earnera movement s, panning, 
zooming, etc.) 


ENDSCENE 

The resulting scenes are displayed by the 
statement: 
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SHOOT SCENES 

The scenes are displayed in sequential 
order ranging from the lowest scene identifier 
to the highest one. The use of scene constructs 
is advantageous because the animation sequence 
can be broken into a set of independent scenes, 
thus allowing its construction on a piecewise 
basis. The creation of independent scenes also 
permits changes in the scene ordering without 
any knowledge of the frame identifiers involved. 


4.0 ENHANCEMENTS 

The system is designed as an extension to a 
host language, to be executed in a batch 
environment. A preprocessor translates all 
extended language statements into procedure 
calls. Other implementations, e.g. as a command 
language to be interpreted by an executive 
kernel, are possible. The batch environment 
creates problems in the use of both predefined 
models and motions. At present, these 
definitions must be made at the beginning of a 
program before they can be employed. Ideally, 
there should be a method to save any models and 
motions defined in a program. Therefore, 
subsequent programs would need only load the 
definitions for models and motions before using 
them. This would allow the construction of a 
library of models and their movements. 

A problem with the use of library 
definitions is that unless a user has defined a 
model and its motions, or has previously worked 
with them, it is difficult to determine how they 
will appear. It would be useful to have a 
viewing utility which permits the viewing of 
predefined models and motions in an interactive 
environment. This would allow the user to 
determine exactly what had been previously 
defined and what further definitions must be 
made in the program. 


5.0 CONCLUSION 

The project has several advantages over 
existing systems. It permits the modelling and 
animation of figures at a high level of 
abstractions. Where many systems have problems 
working with rotations, this system effectly 
deals with rotations. The use of rotations 
enables the creation of generalized motions 
which can be applied to more than one model, 
while motion specifications using paths and 
forces can only be used on the specific models 
for which they were designed. Interpolation 
currently works with articulate figures at the 
point level since it only has access to the two- 
dimensional projection of a figure. The 
approach presented deals with interpolation on a 
rotational basis (i.e. the rotational extremes 
correspond to the key frames in two-dimensional 
interpolation), thus allowing the animation of 
three-dimensional figures. 
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ABSTRACT 

This paper shows that constraint-based modeling, so far per¬ 
ceived primarily as a graphics technique for man-machine 
interaction, also provides a viable method for the modeling 
of complex surfaces. The idea of constraint-based modeling 
of three-dimensional shapes is described and illustrated by 
examples. Difficulties related to the practical application of 
this idea are discussed, and methods for overcoming them 
are outlined. A potential of the constraint-based approach to 
the modeling of shapes found in nature is indicated. 

RESUME 

Dans les systfemes infographiques k contraintes developpfes 
jusqu’k present, les contraintes gfeometriques fetaient utilisfees 
surtout en qualitfe d’une technique d’interaction. Cependant, 
les memes contraintes peuvent former aussi une base pour 
modeler des objects k trois dimensions. Cet article presente 
l’idee principale du modelage k V aide des contraintes et 
Tillustre avec des examples. L’application de la methode 
pour modeler des surfaces complexes est mise en Evidence. 
Les probifemes numeriques associes sont discutfes, et une 
technique pour les attenuer par une decomposition 
hiferarchique du modfele est introduite. L’application poten- 
tielle de la mfethode pour modeler des formes de nature est 
indiqufee. 

Keywords: constraint-based modeling, polygon meshes, 
free-form surfaces. 


1. INTRODUCTION 

Of all the constraints of Nature, the most far-reaching are 
imposed by space. 

Peter Stevens, Patterns in nature . 

One common computer graphics technique for 
representing three-dimensional objects uses polygon meshes. 
A mesh is defined as a set of connected, polygonally 
bounded planar surfaces. Polyhedra are examples of 
meshes, but the notion of a mesh is more general. In partic¬ 
ular, it also includes polygonal approximations of curved 
surfaces. 
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A mesh description consists of a specification of ver¬ 
tices, edges and faces. Known methods of mesh description 
require the positions of all vertices to be explicitly specified 
in a system of coordinates [Foley and van Dam 1983]. This 
is convenient in many situations, for instance, when a mesh 
is rendered. However, other parameters may be more con¬ 
venient to use when a mesh is modeled. For example, con¬ 
sider descriptions of a regular tetrahedron. In terms of edges 
its definition is trivial - the tetrahedron must have four edges 
of equal length. In contrast, the description of a regular 
tetrahedron in terms of vertices is by far less intuitive, since 
their coordinates cannot be specified without arduous calcu¬ 
lations. 

Mesh definition by specifying the lengths of edges falls 
into the category of constraint-based modeling. Instead of 
specifying vertices directly, a set of constraints, or relations 
between vertices, is defined. The idea of specifying 
geometric figures using constraints is not new to computer 
graphics. It was first implemented in Sketchpad [Sutherland 
1963], and followed in several other interactive graphics sys¬ 
tems [Knuth 1979, Boming 1981, Van Wyk 1982, Nelson 
1985]. Due to its intuitive character, constraint-based 
modeling was used there primarily as the basis for man- 
machine interaction. The possibility of building a 
constraint-based system for the purpose of computer aided 
design was indicated by Lin, Gossard and Light [1981]. 

This paper presents a new application of constraint- 
based modeling - definition of complex three-dimensional 
shapes. 

2. UNSTRUCTURED MODELING 

Various types of constraints can be used when describ¬ 
ing a mesh. For example, they may characterize vertices as 
co-linear or co-planar, specify areas of faces, fix the angles 
between edges and faces, etc. The mesh representation 
described in this paper uses distances between points as the 
main form of constraint. Additionally, lines can be specified 
as parallel to any plane of the system of coordinates ( xy , xz 
or yz), and selected coordinates of vertices can be explicitly 
given. Explicit specification of some coordinates and direc¬ 
tions is necessary to position a rigid object in space, so that 
it cannot translate nor rotate. Thus, the complete description 
of a mesh containing n vertices consists of: 
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Table 1. A constraint-based definition of a regular tetrahedron. 


Specification 

Comment 

o 

ll 

*7* 

ii 

£ 

n 

K* 

Vertex A lies in the origin 
of the system of coordinates. 

O 

II 

03 

N 

II 

5 

Vertex B lies on the axis x . 

z c = 0 

Vertex C lies on the plane xy. 

d(AJB) = d(B,C) = d(CA) = 

Distance between any two vertices 

d(AJD) = d{Bfi) = d(CJ)) = L 

is equal to a given constant L. 

el: A-B el: B-C 
el: C-A e4: A-D 
e5: B-D e6: C-D 

List of edges. 

pi: el-e3—e2 p2: el-e5-e4 
p3: e2-e6-e5 p4: e3-e4-e6 

List of polygons. 


• A system of 3 n equations with 3 n unknown vertex 
coordinates. Each equation represents a constraint. 

• A list of edges expressed in terms of vertices. 

• A list of polygons expressed in terms of edges. 

An example of a constraint-based definition is given in 
Table 1. 

In order to find the coordinates of the vertices, the sys¬ 
tem of constraints must be solved. Since the equations 
describing distances are quadratic, only numerical methods 
can provide a general solution. The results presented in this 
paper were obtained using the Newton method [Conte 1965]. 
For example, Fig. 1 shows a tetrahedron resulting from the 
description given in Tab. 1. Another example - a cubo- 
octahedron - is shown in Fig. 2. 

The approach to constraint-based modeling described 
above is called unstructured, because all constraints are com¬ 
bined into one large system of equations and solved simul¬ 
taneously. In practice, this approach presents several 
difficulties. The first difficulty occurs when defining a mesh. 
If the constraints are not correctly chosen, the resulting mesh 
will not be rigid, or will contain dependent (redundant) con¬ 
straints. In both cases, the Newton method will fail to pro¬ 
vide a solution (the Jacobian is equal to zero). Proper selec¬ 
tion of constraints is a nontrivial task, because the rigidity of 
an object may depend on particular values of the edge 



Fig. 1. A tetrahedron described by Tab. 1. 
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lengths. A simple example, taken from [Hain 1967], illus¬ 
trates this in the two-dimensional case (Fig. 3). 

Even if the set of constraints is correct, its solution may 
be difficult to find: for a given set of initial values, the 
numerical method need not converge or it may converge to a 
wrong solution. This second situation occurs, if more than 
one object satisfies the given constraints. Unfortunately, this 
is often the case. For example, even the simple description 
of a tetrahedron which has been presented in Tab. 1 allows 
for 8 different solutions: the base ABC can be placed in any 
of the four quadrants of the plane xy , and the vertex D can 
be located either above or below this plane. 

The following section describes a technique for over¬ 
coming these difficulties. 



Fig. 2. A cubo-octahedron. 



Fig. 3. Rigidity of an object may depend on 
particular values of distances. Planar meshes 

(a) and (b) differ only by the lengths of some 
edges; however, mesh (a) is rigid, while mesh 

(b) is not. 
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3. STRUCTURED CONSTRAINT-BASED MODELING 

The correct solution can be more easily found if the 
mesh being defined can be thought of as the last element in 
a sequence of rigid submeshes. The first element in this 
sequence is a rigid object, called the kernel, simple enough 
to be properly defined and solved. The subsequent sub¬ 
meshes differ from each other by a few additional vertices 
and edges. Thus, the submesh S M contains as its proper 
subset. When calculating *^/+i > all vertices of 5/ are already 
known, so that at each stage only a small system of equa¬ 
tions has to be solved. Consequently, the sequence of sub¬ 
meshes Si imposes a structure on the set of constraints. 

As an example of the above idea, consider the con¬ 
struction of a pyramid from horizontal slabs. The definition 
of the slab is given in Fig. 4. After fixing the coordinates of 
the base, the construction progresses by placing consecutive 
slabs on top of each other, until the top point is formed (Fig. 
5). 

In the case of the pyramid, the length of the horizontal 
edges decreases from one slab to the next one by a constant 
value: a i+x = a,- c, c > 0. The length of the slanted edges 
bi is constant. By expressing the lengths of edges using 
other formulas, the pyramid can be deformed, and more 
complex shapes can be obtained. For example, Fig. 6 shows 
an Eiffel-Tower-like shape resulting from decreasing the 
length of the horizontal edges of the mesh by a constant fac¬ 
tor: a i+l = afc, 0 < c < 1. Fig 7. shows a dome-like shape 
obtained using septagonal slabs. In this case, the lengths of 
both the horizontal and the slanted edges are changed 
according to the formulas: 

a ;+ i = V«o “ c 2 ' 2 b M ~ a V(«i “ a M? + c 2 

Values of the constants a and c are chosen in such a way 
that the dome can be inscribed in a sphere. Finally, Fig. 8 
shows a vase obtained by changing the length of the hor¬ 
izontal edges according to the function a i+l = a t + c-cos(ia >), 
with c, co > 0. The length of the slanted edges is constant, 
and the slabs are septagonal. 

Unfortunately, not every mesh can be decomposed into 
a sequence of rigid submeshes. Fig. 9 illustrates this in the 
two-dimensional case: the whole mesh is rigid, but it does 
not include any rigid submesh. Consequently, no kernel 
(other than the entire mesh) can be distinguished. Neverthe¬ 
less, in many practical cases the decomposition is not only 



Fig. 4. Definition of a slab. 




Fig. 5. Construction of a pyramid from slabs, 
(a) Construction in progress, (b) The final 
pyramid. 




"Eiffel tower". 
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feasible, but it results in a straightforward way from the 
mesh description. This happens, when graphic modeling can 
be thought of as a process similar to building a real-life con¬ 
struction. Usually, consecutive phases of a construction 
correspond to rigid objects, because a construction becoming 
rigid only in a late phase of development would be techno¬ 
logically difficult to make. Presumably, this argument 
applies not only to man-made constructions, for example 
found in architecture, but also to natural objects such as cry¬ 
stals or living organisms, which result from a growth pro¬ 
cess. 


Just as the varying length of horizontal segments deter¬ 
mines the shape of the vase, a varying growth rate may 
determine a shape created by Nature. This point is best 
described by Stevens [1974]: 

No matter how we try, we cannot make a saddle from five 
equilateral triangles or a simple cup from seven... . Nature 
too is similarly constrained.. She makes cups and saddles 
not as she pleases but as she must, as the distribution of 
material dictates... . If the perimeter of a shell grows at a 
faster rate than the center, the perimeter curls and wrinkles. 

No genes carry an image of how to place the wrinkles; no 
genes remember the shape of the shell; they only permit or 
encourage faster growth at the perimeter than at the center. 



Fig. 8. A vase. Figures (b) and (c) represent cwo different renderings of the polygon mesh (a). 
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Fig. 9. Example of a planar mesh with no 
rigid submesh. 


Since structured constraint-based graphics modeling 
may imitate the natural process of growth, it appears to have 
great potential as a method for modeling shapes found in 
nature. An example of a shell modeled using the 
constraint-based approach is shown in Fig. 10. First, a 
planar section of the shell is "grown" by adding consecutive 
trapezoidal compartments (a). This provides a basis for 
creating three-dimensional "top" (b) and "bottom" portions 
of the shell. The two parts are connected together to form 
the complete polygon mesh (c). The final shape is shown in 
Fig. (d). 




Fig. 10. Construction of a shell. 
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4. CONCLUSIONS 

This paper presents a new method for defining three- 
dimensional shapes. It relies on constraint-based modeling 
of polygon meshes. The definition of an object in terms of 
constraints can be more straightforward and simpler than 
other types of definitions. In order to find vertices of a 
mesh defined using constraints, a system of nonlinear equa¬ 
tions must be solved. In the general case, only numerical 
methods can be used for this purpose. Solving the system of 
equations can be made easier, if the mesh can be decom¬ 
posed into rigid submeshes. Such decomposition is possible 
in many practical situations. Application of the constraint- 
based approach to the modeling of natural objects, for exam¬ 
ple flowers and shells, is an attractive topic open for further 
research. 

ACKNOWLEDGMENT 

This research was partially supported by grant number 
A0324 from the Natural Sciences and Engineering Research 
Council of Canada. 

REFERENCES 

Boming, A. [1981]: The programming aspects of Thinglab, a 
constraint-oriented simulation laboratory. ACM Trams, 
on Programming Languages 3, No. 4, pp. 353-387. 


Conte, S. D. [1965]: Elementary numerical analysis: An 
algorithmic approach. McGraw-Hill, New York. 

Foley, J. D., and van Dam, A. [1983]: Fundamentals of 
interactive computer graphics. Addison-Wesley, Read¬ 
ing. 

Hain, K. [1967]: Applied kinematics. McGraw-Hill, New 
York. 

Knuth, D. E. [1979]: TEX and METAFONT. Digital Press 
and American Mathematical Society, Bedford. 

Lin, V. C., Gossard, D. C., and Light, R. A. [1981]: Varia¬ 
tional geometry in computer-aided design. Computer 
Graphics 13, No. 3, pp. 171-177. 

Nelson, G. [1985]: Juno, a constraint-based graphics system. 
Computer Graphics 19, No. 3, pp. 235-243. 

Stevens, P. S. [1974]: Patterns in nature. Little, Brown and 
Co., Boston. 

Sutherland, I. E. [1963]: Sketchpad, a man-machine graphi¬ 
cal communication system. In 1963 Spring Joint Com¬ 
puter Conference , reprinted in Freeman H. (Ed.): 
Interactive Computer Graphics , IEEE Computer Soc. 
1980, pp. 1-19. 

Van Wyk, C. J. [1982]: A high-level language for specifying 
pictures. ACM Transactions on Graphics 1, Nr. 2, pp. 
163-182. 


Graphics Interface ’86 


Vision Interface ’86 



164 


The Stochastic Modelling of Trees 

Alain Fournier 
David A. Grindal 

Computer Systems Research Institute 
Department of Computer Science 
University of Toronto 
Toronto, Ontario 
M5S 1A4 


ABSTRACT 

We present here a fast method for the modelling of 
trees which brings together two interesting tech¬ 
niques. The trees are modelled as convex polyhedra 
for the description of the gross, shape, and three- 
dimensional texture mapping is used for the detailed 
features. 

The "essential” volume of the tree is represented as 
the convex intersection of half spaces. The advantage 
of this representation is that it allows an adaptive 
level of detail in the display. We use a special algo¬ 
rithm for the display of the convex intersection which 
computes it directly in the frame buffer. The algo¬ 
rithm also allows the computation of intersecting 
polyhedra. 

To transform the convex polyhedra into a more realis¬ 
tic representation of trees, we use three-dimensional 
texture mapping to "modulate" the shape and the 
colour of the basic polyhedra. We then obtain an 
irregular non convex object, which is consistent in 
shape and general appearance regardless of the point 
of view and the size on screen. Three dimensional 
fractional Brownian motion is one of the procedural 
texture used. 

KEYWORDS: tree modelling, half space intersection, 
stochastic modelling, frame buffer algorithms, adap¬ 
tive modelling. 

RESUME 

Nous presentons ici une mSthode rapide pour le mode- 
lage des arbres qui r^unit deux techniques 
interessantes. Les arbres sont model es par des 
polyhedres convexes pour la representation de la 
forme globale, et une texture a trois dimension est 
utilis&e pour modeler les details. 

La forme "essentielle" de l’arbre est realisee par d'es 
polyhedres convexes, resultats du calcul de 
^intersection de demi-espaces. L’avantage de cette 
representation est qu’elle permet un niveau adaptif de 
detail. Nous avons dSveloppe un algorithme pour le 
calcul de Intersection convexe directement dans la 


memoire damage. Avec une 16gere modification 
Talgorithme permet le calcul de polyhedres qui 
s’intersectent. 

Nous transformons les polyhedres en • une 
representation plus r4aliste des arbres en utilisant le 
"mapping" d’une texture a trois dimension pour 
moduler la forme et la couleur des polyhedres de base. 
Nous obtenons ainsi un object non-convexe et 
irr£gulier, dont la forme et l’apparence generale est 
consistante independemment du point de vue et de la 
taille de l’arbre sur ltecran. Le mouvement Brownien 
fractionel a trois dimensions est une des procedure de 
generation de texture utilisees. 

MOTS CLES: modelage d’arbres, modelage stochas- 
tique, intersection de demi-espaces, algorithmes de 
memoire damage, modelage adaptif. 

1. Motivations 

Trees are obviously very important in the modelling 
of natural scenes and landscapes. Problems are caused 
by the large number of trees needed and their consid¬ 
erable variety of shapes. The main criteria for a good 
model are to be realistic, easy to compute (both in 
terms of the basic operations needed and of the time 
complexity), flexible (capable of generating the intra- 
and inter-species variations in shape), adaptive (gen¬ 
erating various level of details as needed) and com¬ 
pact. Of course, depending on the application, one or 
more of these criteria can be relaxed if not all can be 
met. The techniques used so far include grammar gen¬ 
eration systems [AoKu84, Smit84], particle systems 
[ReB185], polygonal description plus two-dimensional 
texture mapping [Bloo85] and simple volume primi¬ 
tives plus two-dimensional texture mapping [Gard84, 
Gard85]. The technique we will describe here, which 
is close in spirit to the ones used by Gardner, uses 
simple volume primitives (convex polyhedra) com¬ 
puted in the frame buffer associated with stochastic 
three-dimensional texture mapping. Table 1 gathers a 
subjective evaluation of these different techniques 
with regard to the above criteria. A scale of 0 (not at 
all) to 5 (best possible) is used. 
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Refs. 

Real. 

Easy 

(ops) 

Easy 

(time) 

Flex. 

Adapt. 

Compact 

[AoKu84] 

4 

2 

2 

4.5 

3 

4 

[ReB185] 

4.5 

2 

2 

3.5 

2 

4 

[Bloo85] 

4.5 

2 

2.5 

1 

1 

1 

[Gard84] 

3.5 

2 

3 

2 

3 

4 

Here 

2 

4 

4.5 

3 

4 

4 


Table 1. Subjective comparison of tree models 
To achieve flexibility, we will use a mixture of genera¬ 
tive techniques, as in [AoKu84, Smit84] and stochas¬ 
tic techniques, as in [FoFC82, Reev83, ReB185]. The 
goal of compactness will therefore be achieved, since 
the actual description for each tree is very small. We 
will have to pay special attention to the problems of 
consistency , that is keeping the appearance constant 
as the level of detail, the point of view and the size on 
screen of the displayed objects are changed. 

It is highly desirable that the entire process of gen¬ 
erating, rendering, and colouring the tree(s) be done 
in a reasonable time and with moderate amounts of 
computing power. Since our goal here is not ultra¬ 
realism but a balance between realism and time, we 
hope to be able to approach the conditions for real¬ 
time display. The system described here will not 
create images in real-time. However, it should be 
possible, with hardware and minor software improve¬ 
ments, to bring it close to or achieve real-time perfor¬ 
mance. 


2. The three-dimensional shape 

Primitives to model three-dimensional objects range 
from points to lines to polygons to higher degree sur¬ 
faces. Most of these have been used to model trees. 
Polygons, because they are linear objects, and because 
most rendering systems ultimately deal with polygons 
at the display level, are a tempting choice. They have 
many drawbacks, however. Many polygons are 
needed to represent a complex shape, such as of a 
tree, and they constitute a very inflexible model, hard 
to parametrise or modify adaptively. There is another 
representation scheme which has most of the qualities 
of polygonal models and some additional advantages. 
The volume of the tree can be represented as the con¬ 
vex intersection of half-spaces. A half-space is the area 
of space all on one side of a plane. Formally, a half¬ 
space HSj is the locus of points (x,y,z) such that 
a i x + fyy + CjZ + d A > 0. If several half-spaces are 
intersected, the result is V - P| ^ or 


V = 


(x,y,zK R 3 :Vi a^x + b { y + c*z + d { > 0 


This 


volume, usually enclosed, is convex, and its faces are 
all convex polygons. 


This form of representation is rather different from 
any conventional means of storing three dimensional 
polyhedra. The most radical departure from the norm 
is that it does not store the vertices of the polyhedra. 
The only entities stored are the equations of the inter¬ 
secting planes. 


One benefit of storing planes is that there is added 
information stored in the equation. For the plane 
ax + by + cz + d = 0 the vector (a,b,c) is the normal 
to the plane. This fact will be used later, for the gen¬ 
eration of the trees. It turns out that by using the nor¬ 
mals, a user can create a wide range of trees easily 
and quickly. If the same normals and the same 
parameters are used, the procedure can also con¬ 
sistently generate the same tree. 

Another advantage of the half space representation is 
its flexibility. When the object is defined by a set of 
half spaces, it is possible to get a finer representation 
by splitting the planes. This splitting can be done to 
any one plane, without greatly affecting the total 
volume or overall shape. With a polygon mesh it is a 
difficult process, because the criteria to merge and 
split polygons are not obvious, and a change can affect 
many polygon boundaries. 

This scheme has another (minor) advantage over the 
polygon mesh, in that the amount of storage needed 
for the same polyhedra is a little less. 

2.1. Generating the Tree 

Using the normals (ie planes) to generate the trees 
gives more freedom in generating trees randomly. 
Many schemes could be thought of for splitting nor¬ 
mals, in order to create a convex hull. In fact any 
grammar can drive the process. We will only describe 
one method here. 

The principle is illustrated by Figure 1. The existing 
normal defines the current plane P L . The normal 
will be split into 5f 2 ^3* which will define the 

planes P 2 and P 3 . It is desirable that the area, or 
volume, described by P 2 and P 3 be approximately that 
described by There should be some "natural" 
breakdown of the normals so that the end result after 
several splits, is roughly the same as the original 
plane. 

In addition to the manner in which a given plane is 
split in two, there is the further choice of which plane 
is to be split. There are many possible rules which 
could be followed here. The "oldest" plane could be 
split each time. A "lifetime" could be assigned to each 
plane, with a probabilistic chance of it being split 
when its life is over (the most likely probablily here 
would be a negative exponential), or planes could sim¬ 
ply be chosen at random. In the system described 
here, the planes were split in generations. All the 
planes were split at each stage. Thus all the resulting 
planes are of the same "age", and there are always 
2 n *N of them, where n is the number of generations 
and N is the number of initial planes. This is 
equivalent to applying the production rules of a paral¬ 
lel grammar at each generation. 

The equation that governs the splitting of the normals 
can be read off of Figure 1. The normals should be 
split so that in the average case; 

l 2 cos at], = l 3 cosa 2 = |BC|. 

Since this equality only holds in the average case, 
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Figure 1 Splitting the Normal 

there will be some random perturbation around the 
exact values. Even with this equation restricting the 
splitting method, there are still a great number of 
parameters to control. The following algorithm was 
used: 

Step 1) Choose point B. This point will be the ori¬ 
gin for the two new normals, ^ 2 ^3- 

The parameters here are /x B and <r c Point B 
will be chosen a distance down the normal 
from C: |BC| = 1 i*/xb + gauss()*<r B . 

Step 2) Choose the angles at which the new nor¬ 
mals will split from the present normal: 

<*i = fi a + gauss()*<7 a 
«2 = Ma + g auss 0* a a 

Step 3) Chose the length of the two new normals: 

1 2 = |BC|/cosa 1 + gauss()*a 1 *l 1 

1 3 = |BC|/cosa 1 + gauss()*ff l *l 1 

The lengths are designed so that the end of 
the normal is in the plane P x . 

Step 4) Reduce the angle at which the new normals 
are created: 

Ma = Ma* ratio 
a a = <r a *ratio 

This maintains the user's control over the 
creation process. If the splitting angle were 
not reduced, then the normals resulting 
after two or three levels of recursion would 
have no resemblance to the original. 

Since the representation being used is that of convex 
intersection, it is possible for one errant plane to chop 
the tree in half. This occurs if a normal is split far 


enough away from its predecessor's original direction. 
The problem is roughly similar to that of self¬ 
intersection in two dimensional stochastic interpola¬ 
tion. Figure 2 demonstrates how this can happen if 
the normals split in just the wrong way. In fact, the 
problem occurs more often if the splitting is taking 
place in three dimensions (as is being done) instead of 
two dimensions (as is being shown). In the diagram 
the seven "outside” normals have been shortened for 
sake of clarity. 



Figure 2 The Effect of One Errant Normal 

In order to prevent this from occurring, one more res¬ 
triction is added to the creation process. The pro¬ 
cedure keeps only the normals that point "outwards". 
The algorithm ensures that if a normal's direction is 
into a certain octant, that the origin of the normal is 
also in that octant. If the normal is (a,b,c) and its 
point of origin is (x,y,z), then the normal is retained if 
and only if 

ax > 0 AND by ^ 0 AND cz ^ 0. 

This process of pruning the normals is demonstrated 
in two dimensions by Figure 3. In this diagram, nor¬ 
mals % I>, and cf would be retained, where a and ef 
would be rejected. Using this pruning method it can 
be seen that an occurrence such as that in Figure 2 is 
not possible. This means that the convex hulls should 
be fairly well proportioned. One "bad" normal can not 
cut away half of the volume. 

The creation procedure lets the user define any 
number of normals to start. Empirically, it turns out 
that beginning with three to six normals gives the 
best results. This process gives a large amount of con¬ 
trol over the result. If the input included a long vec¬ 
tor, the result was usually a long thin tree. The input 
angles are additional parameters which permit wide 
control of the overall shape. In fact the sample space 
is large enough that it has not yet been fully explored. 
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purpose of Figure 4. 



Figure 3 Example of Pruning the Normals 

2.2. The Half-Space Intersection Algorithm 
We now have a collection of planes to model the tree. 
What is needed is a visible-surface algorithm for the 
intersection of half-spaces. The problem of finding the 
convex intersection of half spaces has been explored 
by Brown, among others [Brow79] and is 0(N). 
Although his method was not a visible surface algo¬ 
rithm, it could be adapted to this purpose. However, 
Brown treated the problem as one of geometry, not of 
graphics, and his solution is in world space. A visible 
surface algorithm for convex intersection that uses the 
frame buffer was presented in [FoFu86]. It is similar 
in many ways to the standard Z-buffer algorithm used 
for many polygon based systems. In the terms of 
[FoFu86], each pixel needs two registers. With a 
large frame buffer, providing enough bits for two 
registers is not too difficult as long as the stored 
values can be bounded. 

At the beginning of the algorithm, in Pass 0, the 
value of current-back is set to the farthest possible 
value. This represents the background depth. The 
back-facing planes are scanned out first. If a plane is 
in front of the current farthest-forward back-facing 
plane, then that depth is stored for that pixel. For the 
sake of clarity the equation for z was used in the 
description of the algorithm, but in practice the the 
depth value is calculated incrementally at each pixel. 
Thus the calculation costs only one addition for each 
point. 

The same process is followed for the front-facing 
planes. Each is scanned out incrementally and at each 
pixel the depth is compared to the current depth. If 
this plane is further back than the old one, then it 
becomes the current depth. However, if the plane is 
behind the most forward back-facing plane, then that 
pixel is not in the convex intersection. This is indi¬ 
cated by placing the same depth in both the current 
front and back registers. All the points at which this 
occurs are then set to the background colour in a 
quick Pass 3. Note that this third pass only scans the 
screen once, as did Pass 0. 

The above procedure leaves on the screen the depth 
values for each visible point of the convex intersec¬ 
tion. A fourth pass coloured the polyhedron for the 



Figure 4 Example of Convex Intersection 

There is one serious problem with the algorithm as 
defined: each plane must be scanned out across the 
entire screen. One can easily assume frame buffer 
hardware that accomplishes that in constant time. In 
fact this is very close to the algorithms used in Pixel- 
Planes [FGHS85]. As the current system was imple¬ 
mented with a general purpose graphics processor, 
this could be a limit on the performance of the algo¬ 
rithm. A way to avoid this extra work is evident from 
classic graphics algorithms. The polyhedron has some 
maximum and minimum x and y values on the 
screen. Simply "box” the polyhedron and only scan out 
the planes inside the box. Boxing the solid, however, 
leads to a new problem. The box is not quickly deter¬ 
mined from a set of plane equations. The solution we 
adopted is to create the box dynamically. The first 
plane or two will be scanned out normally. By the 
third or fourth plane, there will be scanlines on which 

Pass 0 

For all pixels 

current-back = MAXDEPTH 

Pass 1 

For each back-facing plane (c >0) 

_ _ a b d 

z =-x-y- 

c c c 

if z < current-back then 
current-back = z 

Pass 2 

For each front-facing plane (c<0) 



if z > current-front then 
if z < current-back then 
current-front — z 
else 

current-front = current-back 

Pass 3 
For all pixels 

if current-back = current-front then 
Colour = Background-colour 
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no part of the convex hull can possibly be. For exam¬ 
ple a back-facing plane could have cut in to a depth 
less than zero (i.e. behind the screen). In practice this 
eliminates a great many scanlines from consideration. 
The same process applies vertically. If the equations 
of the original planes are retained, then some begin¬ 
ning box can be computed from these. Since the 
number of initial planes is small (from 3 to 6) this is 
easy, and it has only to be done once. Then as the 
program runs, the box will be shrunk dynamically. 
The combination of the two boxing methods is quite 
efficient. 

The half-space intersection algorithm then will take 
the output from the creation program to give a visible 
surface and depth values. The algorithm from 
[FoFu86] can be generalized to work on several con¬ 
vex hulls during the same run. The generalization 
only requires another register. This gives a total of 
three registers, which causes a problem for most 
frame buffers. As the entity stored represents depth, 
three registers in a 24-bit frame buffer means only 
256 units of depth per register. This is not a great 
deal of room to work with. But it is only a temporary 
hardware limitation. We expect most future frame 
buffers to be more generous in bits/pixels. In fact 
there are already some with 48 bits, like the Pixar 
[LePo84]. 

The multiple convex intersection algorithm works 
very much like the single. The polyhedra are pro¬ 
cessed individually. This takes up the same two regis¬ 
ters as before for current-back and current-front. The 
difference is that after a polyhedron is finished, it is 
then merged with those already scanned out. At each 
point, the depth of the just created surface is com¬ 
pared to the depth of the surface already there, if any. 
The surface closer to the viewer is kept in the third 
register. It should be noted that this algorithm not 
only allows multiple convex hulls, but that the hulls 
may actually intersect each other and the correct 
result will be obtained. 



Figure 5 Example of Multiple Convex Intersection 

The algorithms described so far results in an adaptive 
convex polyhedral shape to be written into the frame 
buffer for each tree. To make this shape more realis¬ 
tic, several methods can be used. One is to use sto¬ 
chastic interpolation [PiFo84, FoMi85] to "roughen" 
the hull by adding stochastic variations to the depth 
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of the visible faces. That will create rough (non- 
convex) edges, and possibly holes in the shape of the 
tree [Grin84]. On other is to use three-dimensional 
texture mapping. This is the technique that will be 
described in the next section, but it should be noted 
that they can and have been used concurrently. 

3. Three-dimensional Texture Mapping 

Texture mapping in two-dimension is a simple and 
powerful idea that has a long history in computer 
graphics [Catm74, BlNe76, Blin78]. More recently 
the idea was generalized to three dimensional texture 
[Perl85, Peac85]f. What is needed is a texture solid, 
and a method to map it to the screen. The texture 
solid can be created by any process desired, as in the 
two dimensional case for the texture tile. The cube 
can be pre-computed, run-time computed, hand-drawn, 
or digitized from a real image. The creation is a pro¬ 
cess separate from the mapping. The mapping itself is 
simple in principle. The face of the object to be 
mapped has a set of coordinate values for its position. 
At each point on the object face, the (x,y,z) coordinate 
values are mapped into the (i j,k) values of the texture 
cube. 

One problem inherent to the idea of a three dimen¬ 
sional texture map is the sheer amount of storage 
necessary to hold the texture cube. One solution to 
the problem is to recognize that the frame buffer itself 
is a large block of memory. Assume a 32-bit frame 
buffer, not unreasonable by today's standards. This 
means that 32 bits of information are needed at each 
point in the cube. If the texture cube is stored in the 
top eight bits of each pixel, then four screen pixels 
store one texture pixel. Thus a 32 X 32 X 32 bit tex¬ 
ture cube would take up (2 5 ) 3 *2 2 =2 17 screen pixels. A 
512 X 512 frame buffer contains 2 18 points. It can be 
seen that even a sizable texture cube stored only in 
the top bits will easily fit into the frame buffer. By 
taking only the top eight bits, the lower 24 are left. 
Thus, the normal red, green and blue planes are 
untouched. 

A second difficulty with three dimensional texture 
mapping is that of aliasing. This problem occurs, as it 
does in the two dimensional case, when a large scale 
difference between the texture cube and the object 
being mapped causes sampling problems. Solutions 
used in two-dimensional texture mapping can be 
applied here too. In particular the MIP map tech¬ 
nique [Will83] directly translates to three dimensions. 
As in the two dimensional case the texture tile is 
repeatedly replicated at half resolution. Initially the 
texture cube takes up half of 512x512x8 bit buffer. If 
the cube is averaged into a cube half its length per 
side, it will only be one eighth of the size of the origi¬ 
nal. This process can be repeated and the eventual 
result will not even fill the buffer. This form of pre¬ 
computed averaging is a viable solution for at least 
some of the aliasing problems. 


t The work described here was completed before these papers appeared, and 
thus our use of three-dimensional texture was developed independently. See 
[Grin84]. 
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The fact that only the top of the frame buffer is used 
by the texture cube, has an important meaning. If the 
rest of the processing does not require the upper eight 
bits, then the texture cube can be pre-processed and 
read in before beginning the rest of the work. This 
means a considerable savings in run-time. Unfor¬ 
tunately, in this implementation the frame buffer con¬ 
tained only 24 bits per pixel. Since the half-space 
intersection algorithm needed 24 bits, this meant that 
the texture cube had to be read in only when it was to 
be used. This is yet another incentive to get as many 
bits in a frame buffer as you can afford. You will 
always find uses for them. 

This method of storing the cube leads to a simple 
mapping function. Assume that a 2 n element-per-side 
cube is stored in a word-addressable 512 X 512 frame 
buffer. If the screen address is (x,y) with a depth 
value of z, then the mapping is a simple 

addr = (x + y*2 n + z*2 2n )*4. 

In other words, the texture cube is treated as a large 
three-dimensional array. To find the exact (ij) posi¬ 
tion in the frame buffer, the result above is split bit¬ 
wise. The lower 9 bits are the i position; the upper 8 
bits are the j position. The multiplication by 4 is 
because 4 screen pixels store one texture point. Thus 
the mapping takes only three shifts and two additions 
per point. 

The inputs (x,y,z) to the mapping function above must 
be contained in the cube. That is, with the above 
assumptions, 0 ^ x,y,z < 2 n . To achieve this all that 
needs to be done is take the original (x,y,z) values 
modulo 2 n . This is equivalent to creating a large 
enough texture cube by repeating the smaller one 
over and over. It should be noted that because of the 
nature of the three dimensional cube, it is unlikely 
that there will be some undesirable macroscopic pat¬ 
tern created by this repetition, as often occurs in the 
two dimensional case. This is so, because in the two 
dimensional case, the same picture is repeated 
exactly. With a texture cube, this can only occur if the 
surface is at the same angle and position across 
several cubes, which is less likely to happen. 

4. Mapping the Texture to the Polyhedra 

In effect three-dimensional texture mapping allows 
the faces of the polyhedra we have defined previously 
to determine the boundaries of the tree in the texture 
space . The three dimensional texture cube can be 
generated by randomly placing small "chunks” of 
colour in three-space. The colours are chosen by the 
user, as well as the number of chunks and the percen¬ 
tage of each colour. It can also be generated using 
three-dimensional fractional Brownian motion 
[MaVN68, FoFC82], or other suitable procedural tex¬ 
ture. 

Each pixel which displays a part of the tree, contains 
three coordinates: the (x,y) position on the screen and 
the z value in the frame buffer. Each of these points is 
put through the inverse of the transformation applied 
to the objects to give the (x,y,z) real-world coordinates 
which will then be used as indices into the texture 
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cube as described above. This process give the con¬ 
sistency of colour desired and is also done with rea¬ 
sonable speed. The important point from the point of 
view of efficiency is that the mapping can be done 
incrementally. 

The colouring of the tree is done in one pass through 
the screen. Each point is put through the inverse 
transform, and then mapped into the texture cube. At 
the top of the screen a current position in world space, 
(x c ,y c ,z c ) is caculated. This is obtained by putting the 
first point through the transform. Let the transforma¬ 
tion be M:R 3 -»R 3 Then for some Ax, since M is 
linear, 

M(x+Ax,y,z) = M(x,y,z) + M(Ax,0,0) 

= (x c ,y c ,z c ) + AM X 

AM X is a constant for a constant Ax. This, of course, 
generalizes to AM y and AM Z . If the depth value 
changes non-linearly in the frame buffer, as it would 
if the tree has been stochastically "roughened”, then 
an increment for a changing z value is needed. 
Again, the linearity of M allows an incremental com¬ 
putation of M(x+Ax,y,z + Az). 

With this use of the three dimensional texture map¬ 
ping, the tree has been coloured, with the ability to 
both reproduce the shading in place, and shade it 
correctly as the viewpoint moves. This all was accom¬ 
plished with reasonable speed. 

5. Adding the Trunk 

Now that the crown of the tree has been shaped and 
shaded, the trunk of the tree is to be added. Part of 
the information stored during the processing of the 
three dimensional crown is the position of the centre 
of the tree. This is usually the base of one of the plane 
normals generated or given in the creation of the tree 
by splitting normals. After the centre position is put 
through its transforms, the resulting depth value lets 
a perspective mapping be done which scales the trunk 
to a size that fits the rest of the tree. This perspective 
mapping is a standard transform. 

To shade the trunk, a modified version of Blinn’s 
wrinkled surface technique was applied [Blin78]. The 
trunk is given a base colour, usually some dark 
brownish-red. Then ranges are given for each of the 
component colours (red, green, blue). A random 
amount within that range is added to the base colour 
at each point. For example, if the base colour is 
(60,30,10) and the ranges are (40,20,10), then the 
colour at each point of the trunk would be an r,g,b tri¬ 
ple with red €(60,100), green €(30,50), and blue 
€(10,20). Values are uniformally distributed within 
these ranges. When the parameters were chosen well, 
this scheme gave a very acceptable simulation of tree 
bark. This method also lets different kinds of trees be 
modelled properly. Poplars, for example, have a 
smooth, light-green coloured bark, oaks a rough 
brown bark. 

There remains to determine the visibility between 
crowns and trunks. On possibility is to use a reverse 
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painter’s algorithm. The trunks are painted after all 
the crowns, from front to back, and they are not 
painted over anything already there. If the point of 
view is from above, such as every crown has priority 
over every trunk, this will give the correct priority. 



Figure 6 Completed Three Dimensional Tree 

A more general method is to model the trunks as con¬ 
vex polyhedra, and use the algorithms applied to the 
crowns. The trunk can be described as a hexagonal 
cylinder, or cone, rendered with the half space 

intersection algorithm, and coloured as before. This 
approach gives an exact solution to the visibility prob¬ 
lem, but adds seven or eight planes to scan out for 
each tree. It is not a large additional burden, espe¬ 
cially since boxing is easier and more efficient given 
the shape of the trunk. It should be mentioned here 
that in our context we do not worry about modelling 
branches. 

6. Implementation issues 

We will describe in this section how the system was 
implemented, give numbers to indicate the system 
performance, and discuss ways in which this perfor¬ 
mance can be improved. 

6.1. Implementation Description 

The work of the system is split between two 
machines. The mainframe is a PDP VAX 11/780 run¬ 
ning UNIXt. The other machine is an ADAGE RDS- 
3000 Graphics Processor and Raster Display System. 
This is a modular system with its own bus and it is 
interfaced to the VAX. The ADAGE bus is synchro¬ 
nous with a 32-bit data path. The basic cycle time is 
200ns. The frame buffer is 512 by 512 pixels, each 
with 24 bits. It can also be organized in a IK by IK 
mode with 6 bits per pixel. Much of the power of the 
ADAGE comes from the use of the 200ns cycle, 32 bit, 
bit-slice processor. The processor is supported by a 

t UNIX is a trademark of Bell Laboratories 
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4K by 64-bit wide microcode memory and an 8K by 
32-bit wide scratchpad memory. The processor also 
includes a 16 X 16 bit hardware multiplier which does 
a signed multiplication in two cycles (400ns). The 
code for the graphics processor was written in a C-like 
language for a compiler developed at the University of 
North Carolina [Bish82]. While allowing only integer 
arithmetic, this language was of immeasurable help 
to the implementation. 

Almost all of the actual processing work for the tree 
creation system implemented was done on the 
ADAGE bit-slice processor (herein called simply sim¬ 
ply the Adage). The VAX processor was used only as a 
driver, loading microcode into the Adage and starting 
the routines, and to perform the basic geometric 
operations of splitting the normals. 

In some parts of the system it was necessary to do 
non-integer arithmetic on the Adage. The best exam¬ 
ple of this was the convex intersection routine. A 
series of fixed point routines (one 16 bit word for the 
integer part and one 16 bit word for the fraction) were 
implemented. In addition to needing non-integer 
arithmetic, several of the Adage routines needed ran¬ 
dom, or at least pseudo-random, numbers. We used a 
multiplicative congruential routine to generate the 
pseudo-random numbers. To ensure that the routine 
did not loop, a new random seed was used every 512 
iterations. This method did not consume excessively 
large amounts of time to feed seeds down to the 
Adage but did generate satisfactory random numbers 
for the Adage routines. 

6.2. System performance 

Detailed timimg information can be found in [Grin84], 
For the icosahedron of Figure 4, with an initial box of 
512x512, the rendering takes roughly 12 seconds. 
These assumptions give a time of approximately 
50/isec per pixel, or about .7 to .8 seconds per plane. 
The roughening step, if applied takes about 1.0 second 
for a 250x250 pixel object. The texture generation, 
takes also about one second, but again this is a 
preprocessing step if sufficient storage is available for 
the texture. 

The other important factor is the time to load each 
separate program in the processor, when the micro¬ 
store is not big enough, which was the case in our sys¬ 
tem. This is also dependent on the load on the VAX 
and can take several seconds. 

6.3. Possible Speed Improvements 

At present the tree creation system is several orders 
of magnitude away from being real-time. The key to 
improve the performance is in a combination of spe¬ 
cialized processors and a suitable multiprocessor 
architecture. 

Specialized processors already exist for the type of 
operations used in the system. For the creation of the 
planes by normal splitting, most of the operations are 
floating point operations, with calls to a normal distri¬ 
bution function, and to trigonometric functions. The 
functions can be replaced by lookup tables. In this 
case, each splitting operation takes less than 20 
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floating point operations and/or lookup steps. The 
number of splits necessary depends on the number of 
trees, and their size on screen. It also depends on how 
many different trees the system uses. There can be 
many trees on the picture sharing the same convex 
polyhedron. To take a numerical example, assume a 
512x512 display, 200 trees, each on the average 20x20 
pixels, and covering 1/4 of the screen, that is an aver¬ 
age depth complexity (for the trees only) of 1.22. 
Further assume that a tree on the average goes from 
6 planes (in the initial master) to 12 on the picture, 
that is needs 6 plane splitting operations. At 60 
frames/second, that means 1.4 MFLOPS for the pro¬ 
cessor in charge of the splitting. This is easily achiev¬ 
able on a custom VLSI. 

The second step, and the main bottle-neck in the 
current system, is the computation of the convex 
intersection. As mentioned before, an architecture 
such as used in the Pixel-planes is suitable for the 
basic operations used in this step. Making the 
assumptions in [FGHS85], that is a lOMhz clock, and 
reasonable values for the number of bits in the plane 
equations, we obtain about 60 clock cycles per plane 
scanned out, that is each plane is scanned out in 6 /xs. 
The trees can then be scanned out in 14 ms, which is 
fast enough. Note that this is independent of the size 
of the trees. 

The stochastic values needed for the roughening step 
and the three-dimensional texture generation can be 
supplied by a processor like the STINT [PiFo84]. The 
current implementation of the STINT generates two- 
dimensional texture, and can only generate a 70X70 
texture in real time, but most of the textures needed 
can be precomputed. It also should be noted that spe¬ 
cialized hardware for real time texture mapping is 
already in use in flight simulators such as Evans & 
Sutherland CT6 or General Electric Compuscene. 
Remains to organize these processors into a suitable 
display architecture. This is a complex task, espe¬ 
cially since there are other parts of the display system 
to consider (terrain, buildings, moving vehicles, 
atmospheric effects, etc.). This is left, as they say, to 
further research. 

7. Conclusions 

Within the stated limits: reasonably realistic trees, 
simple operations, adaptability and flexibility, we feel 
that the techniques described here succeeded fairly 
well. One interesting lesson is also that the system 
distinguishes clearly between the modelling of the 
shape, which is done with the implicit intersection of 
half-spaces, and the rendering method, which is the 
combination of a frame buffer algorithm and three- 
dimensional texture mapping. 

We saw also that the simplicity of the operations and 
their modularity led to the conclusion that with suit¬ 
able specialized processors, the real-time generation 
and display of several hundreds such trees is possible. 
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ABSTRACT 

A number of spectral modeling approaches in the engineering 
and estimation literature are potentially applicable to stochas¬ 
tic synthesis in computer graphics. Two specific approaches 
are developed. The orthogonality principle of estimation 
theory is used to derive a stochastic subdivision construction 
with specified autocorrelation and spectrum properties; this 
approach also provides an alternative theoretical basis for the 
popular fractal subdivision algorithms. A shaped Poisson 
point process is a second approach which conveniently 
separates the spectral and graphic modeling problems. Syn¬ 
thetic textures and terrains are presented as a means of visu¬ 
ally evaluating the constructed noises. 

KEYWORDS: stochastic models, texture synthesis, fractals, 
terrain modeling. 

RESUME 

Les methodes de modelisation spectrales empruntees aux sci¬ 
ences de l’ingenieur, ou de'rive'es de la theorie de l’estimation 
peuvent etre appliquees a la synthese stochastique dans le 
champ de l’informatique graphique. Deux points de vues sont 
presented dans cette communication. A partir du principe 
d’orthogonalite de la theorie de l’estimation on peut deriver 
une methode de subdivision stochastique possedant certaines 
specifications d’autocorr^lation et proprietes spectrales; cette 
approche fournit aussi une base theorique nouvelle pour la con¬ 
struction d’algorithmes de subdivision fractale. Un processus 
utilisant un filtrage de l’impulsion de Poisson fournit une 
deuxieme approche, qui permet de determiner une separation 
claire des problemes de nature spectrale de ceux lies a la 
mode'lisation graphique. Les textures synthe'tiques et les 
mode'les de terrains presented permettent d’evaluer visuelle- 
ment les bruits ainsi ge'neres. 

MOTS CLEFS: Mode'les stochastiques, Textures synthe'tiques, 
fractal, modelisation de terrain. 
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1. Introduction 

Stochastic techniques have assumed a prominent role in 
the synthesis of complex and naturalistic imagery, for 
example, [1] [2] [3] [4] [5] [6] [7]. This role has been termed 
amplification [5]: the image modeler specifies a pseudo¬ 
random procedure and its parameters; the procedure can 
then automatically generate the vast amount of detail 
necessary to create a realistically complex scene. The 
success of stochastic modeling depends both on its econ¬ 
omy and on our ability to construct stochastic models to 
approximately emulate a variety of phenomena. The full 
power of stochastic modeling has not been achieved in 
existing techniques. For example, the widely-used sto¬ 
chastic fractal techniques model only spectra of the form 
/ ~ d , and thus cannot describe phenomena with scale- 
dependent detail or directional or oscillatory characteris¬ 
tics. 

The problem of modeling a random process (“noise”) 
with an arbitrary spectrum is well understood. Basically, 
the procedure is to filter an uncorrelated noise (as 
obtained from a random number generator) to obtain the 
desired spectrum. The spectrum of the filtered noise is 
simply the squared magnitude of the transfer function of 
the filter. Using this synthesis procedure, many of the 
filtering and spectral analysis approaches described in the 
literature are potentially applicable to the problem of sto¬ 
chastic modeling in computer graphics. This paper 
adopts two approaches, optimal mean-square estimation 
and a shaped point process model, to produce stochastic 
synthesis algorithms which are computationally suitable 
for computer graphics applications. 


2. Generalized Stochastic Subdivision 

The stochastic subdivision construction described by 
Fournier et. al. [1] may be generalized to synthesize a 
noise with an arbitrary prescribed spectrum (the 
generalized subdivision technique is described in more 
detail in [8]). The basis of the Fournier et. al. construc¬ 
tion is a midpoint estimation problem: given two samples 
considered to be on the noise, a new sample midway 
between the two is estimated as the mean of the two 


Vision Interface ’86 



174 


samples, plus a random deviation whose variance is the 
single noise parameter. The construction is based on two 
properties of fractional Gaussian noise: 

1) When the values of the noise at two points are known, 
the expected value of the noise midway between two 
known points is the average of the two values. 

2) The increments of fractional Gaussian noise are Gaus¬ 
sian, with variance which depends on the lag and on the 
noise parameter. 

Since only the immediately neighboring points are con¬ 
sidered in making the midpoint estimation, the noise 
autocorrelation information is not used, and the con¬ 
structed noise is Markovian. This is not a limitation as 
long as the construction is used as an approximate (sta¬ 
tionary) model for Brownian motion. However, the con¬ 
struction has been applied to the non-Markovian frac¬ 
tional noises /~ rf , 2 [9]; in these cases, disregarding 

the autocorrelation produces “creases”. 

The general problem of estimating the value of a stochas¬ 
tic process given knowledge of the process at other points 
is the subject of estimation theory and of the Wiener and 
Kalman filtering techniques [10]. The orthogonality prin¬ 
ciple indicates that the mean-square error of a stationary 
linear estimator will be minimum when the error is 
orthogonal in expectation to the known values on the 
process. It is also known that when the estimated process 
is Gaussian (as in the case of fractional noises), the linear 
estimate is optimal in the sense of being identical to the 
best nonlinear estimate given the same number of obser¬ 
vations [11] [12]. Stochastic subdivision is specifically 
similar to the application of digital Wiener filtering in the 
linear-predictive coding (LPC) of speech [13], since in 
both of these applications points on a stochastic process 
are estimated, and then perturbed and re-used as “obser¬ 
vations”. 

In our case, the midpoint x at each stage in the construc¬ 
tion will be estimated as a weighted sum of the noise 
values x known from the previous stages of the construc¬ 
tion, in some practical neighborhood of size 25: 
s 

x t +0.5 = Yj a k x t +k 
k^l-S 

(with t indexing the points known at the previous con¬ 
struction stage). The estimated value i f+05 form a 
new noise point with the addition of a random number of 
known variance; the new points will in turn form some of 
the data in subsequent construction stages. 

The orthogonality principle then takes the form 

^ j X t + m | x t +0.5 “ ^ Yj J*k x t +k j j = ® 
or 

£j*«+m*«+0.S - =° 


for 1 -5<m<5. Recalling that the expectation of 
x t +i x t+ j is the value R(i-j) of the noise autocorrela¬ 
tion function R (for a stationary noise), we obtain the 
equation 

5 

R (m -0.5) = a k R(m-k) 

k =1-5 

which can be solved for the coefficients a k given R . The 
matrix R(m-k) is Toeplitz, permitting the use of 
efficient algorithms available for the inversion of these 
matrices, such as the Levinson recursion [14]. The 
mean-square estimation error 

E{(* - x) 2 } = R (0) - £ R (0.5—A:) 

k *=l-S 

is used to select the noise variance (and optionally the 
neighborhood size) at each construction stage [8]. Fig. 1 
illustrates successive stages in generalized subdivision to 
an oscillatory noise with an autocorrelation 
R (r) = cos(u;r) exp(-r 2 ). 

2.1. Subdivision In two dimensions 

The significant difference from the one-dimensional 
solution is that there are now several classes of points to 
be estimated, categorized by their spatial relationship to 
the points computed at previous subdivision levels (this 
depends on the selected interpolation mesh). For the 
planar quadrilateral mesh shown in Fig. 2 the mid-face 
vertex ‘x’ will require different coefficients than the mid¬ 
edge vertices ‘o’. For example, (using our coordinate sys¬ 
tem with the midpoints “indexed” by 1/2) the midpoint 
coefficients are obtained by solving 

R {j —0.5, t —0.5) 

= E E a riC R(j-r,i-c). 

r =1-5 c =1-5 

for 1-5 <j,i <5. This equation can be considered as a 
system A x = b by rewriting R (y >x) and a r e as vectors 
by a consistent ordering of the subscripts; the dimension 
of the matrix A is now the square of the neighborhood 
size 25. 

2.2. Evaluation 

The generalized subdivision technique produces high- 
quality noises with specified spectra and eliminates the 
creases associated with stochastic subdivision to non- 
Markovian noises. It also shares the attractive properties 
of the stochastic subdivision construction [1], i.e., the 
consistency properties described in [1] including the abil¬ 
ity to model a noise at different resolutions, and the abil¬ 
ity to model regions of a noise in any order (a “non- 
causal” property which is not available in Fourier syn¬ 
thesis and other spectral synthesis approaches). When a 
separable Markovian autocorrelation function 
R (x ,y ) = exp (- | x | )exp (- | y | ) is specified, the gen¬ 
eralized subdivision reduces to a form of fractal subdivi¬ 
sion, in the sense that only the coefficients for the nearest 
neighbors of an estimated midpoint are non-zero. Subdi¬ 
vision to non-Markovian spectra is computationally more 
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expensive due to the larger neighborhood sizes required. 
Fig. 3 shows several textures produced with the general¬ 
ized subdivision technique and Figs. 4, 5 illustrate two 
height fields produced using this technique, displayed as 
synthetic terrains. 

Several limitations of the generalized subdivision tech¬ 
nique are: 

One must know or invent the noise autocorrelation func¬ 
tion. Since the autocorrelation function is the Fourier 
transform of the power spectrum (Wiener-Khinchine rela¬ 
tion), and the latter must be non-negative, the autocorre¬ 
lation function must be non-negative definite. Unless this 
constraint is well understood, it may be easier to design 
the power spectrum and obtain the autocorrelation by 
transformation, or to restrict one’s choice to paradigmatic 
or empirically estimated autocorrelation functions. 

A second restriction of the generalized subdivision tech¬ 
nique derives from the variable-resolution property of 
subdivision constructions. The identification of different 
stages in the construction with different resolutions is 
strictly incorrect. This can be seen from one point of 
view by considering the problem of obtaining a half- 
resolution version of a given noise record. A half¬ 
resolution noise which preserves the spectral content of 
the original up to the new, lower N^uist rate is achieved 
by low-pass filtering, followed by dropping every other 
sample (“decimation”). The half-resolution noise result¬ 
ing from reducing the recursion level in a stochastic sub¬ 
division construction is achieved by decimating without 
filtering. A half-resolution noise does not in general coin¬ 
cide with every other sample of the original noise unless 
the latter has no detail at frequencies above half its 
Nyquist rate. Thus, any spectral energy above half the 
original Nyquist rate is aliased in changing the resolution 
through the subdivision construction depth. 

Significantly, an aliased noise does not form coherent 
artifacts such as Moire patterns; rather, the noise at the 
lower resolution appears as a somewhat different noise 
than the original, so the subject may appear to “bubble” 
during a zoom. The aliasing is limited for noises with 
monotonically decreasing spectra such as fractal noises, 
since the majority of the spectral energy remains 
unaliased in any resolution change. However, serious 
anomalies may occur if the resolution of a noise whose 
spectrum is flat or increasing at some frequencies (as may 
be achieved with the generalized subdivision technique) is 
varied by changing the subdivision recursion depth. 


3. Shaped Point Process 

A second stochastic synthesis algorithm is suitable when 
samples of the desired noise are available. An analysis- 
synthesis approach would analyze the noise to determine 
parameters of a stochastic model, and then apply the 
model to generate a synthetic noise. If the only goal is to 
synthesize the noise, however, a more direct approach is 
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feasible: the noise x is produced by a (discrete) convolu¬ 
tion 

s 

= £ h k u t -k 

k-=-S 

of a uncorrelated noise u with the (windowed) noise sam¬ 
ple h of size 2S' +1, with h playing the role of a filter 
kernel. The autocorrelation of a: is easily derived: 

R(t) = E{x t x t+r } 

= E ££A* h m «,_* « (+r _ m 

k m 

The noise u is stationary and uncorrelated so the expec¬ 
tation of the factors u t _ k u t +r _ m is E {u 2 } 6(r+k-m ), so 

R( T ) = l£h k h k+T 

The power spectrum of x is the Fourier transform of R . 
Since (1) is a convolution h t * h_ tJ its transform is (by 
the convolution theorem [15]) 

S(u) = H(e> u )H(e-)“) = |tf(e ' w )| 2 

so the spectrum of x is that of h (as expected). The 
spectrum of the noise sample h will in turn resemble that 
of the prototype noise if it is large enough to include any 
low-frequency components characteristic of the prototype 
and if it is windowed to reduce the effects of discontinui¬ 
ties at the sample boundary. 

Convolution with a large noise sample is inefficient and 
the convolution would usually be implemented in the fre¬ 
quency domain by FFT. Computational economy can 
also be achieved by replacing the noise u with a ‘sparse 
noise’ or particle system (sampled Poisson point process) 
u which is non-zero at a limited number of points under 
the sample h . The reduced convolution takes the form 

*t = £«* A (*-**) (2) 

k 

where t k is the location of the k th non-zero point of the 
process, and the summation is now over these points 
rather than over h (a similar technique was described as 
one of the methods in [16] but its use as a general spec¬ 
tral modeling approach was not fully developed there). 
The autocorrelation and spectrum are unchanged pro¬ 
vided the values of u are independent. This “shaped 
point process” resembles both shot noise (in which the 
noise u is defined to be a constant-amplitude Poisson 
impulse process), and a generalized form of pulse ampli¬ 
tude modulation reconstruction, which requires t k to be 
evenly spaced. 

3.1, Spectral and graphic modeling 

The primary advantage of this algorithm is not efficiency, 
however, but that it suggests manipulating the point pro¬ 
cess as an entity itself. For example, to produce a ‘fluid 
texture’ by animating the point process requires only 
updating the location of each point by a dynamic equa¬ 
tion, whereas manipulating a uniformly sampled noise 
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field to the same effect requires computations more analo¬ 
gous to those of a fluid flow problem on a uniform grid. 
Similarly, points may be restricted to an area of the plane 
with conceptually simple algorithms such as Monte Carlo 
or an ad hoc placement procedure, whereas restricting a 
noise field requires scan converting the boundary of the 
region or a global windowing operation. 

The non-causal property of subdivision methods is 
achieved in a shaped point process using an appropriate 
(non-causal and consistent) construction of the point pro¬ 
cess. A simple construction is to divide the noise domain 
into numbered cells and approximate the Poisson point 
process by N points in each cell, with the random 
number generator seeded by the cell number. The value 
of the shaped noise at a particular point is obtained by 
(2) summed over only the points in those cells which are 
closer than a radius the size of the kernel. 

The kernel h can also be manipulated independently of 
the point process. The spectral bandwidth of a shaped 
point process is entirely determined by the kernel. If the 
size of the kernel is small compared to the depth in a per¬ 
spective view of a shaped point process noise, the noise 
can be accurately and efficiently anti-aliased by selecting 
appropriate precomputed bandlimited versions of the ker¬ 
nel as a function of depth. The kernel can be varied as a 
function of the position of each point to produce a non¬ 
stationary noise. For example, wind-blown clouds or ter¬ 
rain ridges where the directional tendency varies over the 
scene could be emulated by rotating the kernel as a func¬ 
tion of position. This type of control is not directly 
available in most filtering techniques; e.g. it is achieved in 
a Fourier transform method only by breaking the noise 
into small overlapping stationary regions and interpolat¬ 
ing the synthesis on these regions (overlap-add method 
for short-time Fourier transformation). 

Thus, a shaped point process provides a convenient 
separation between the spectral modeling problem 
(obtaining the kernel) and the graphic modeling problem 
of shaping the noise to form a subject. (A similar separa¬ 
tion occurs in ‘waveform’ speech synthesis: a kernel is 
used to model the formant (spectral) shape; it is con¬ 
volved with a impulse sequence or noise representing the 
voice pitch and amplitude [17]). 

3.2. Evaluation 

The shaped point process is a simple means of approxi¬ 
mately “resynthesizing” noises. The method also general¬ 
izes directly to several dimensions. Fig. 6 shows the 
shaped point process resynthesis of several texture sam¬ 
ples from the Brodatz album [18]. Resynthesis is of 
course more intuitive than specifying the parameters of a 
texture model, and it allows the generation of homogene¬ 
ous noises of arbitrary extent. Periodic noises can be pro¬ 
duced by altering the addressing in (2) to wrap around 
specified boundaries; this is a useful property in applica¬ 
tions such as texture mapping. The shaped point process 
can also be applied with an analytically defined kernel; 
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the lower right plot in Fig. 6 is a perspective view of a 
wave-like texture created with a ban dpass k ernel of the 
form R (x ,y) = cos (ax +py ) exp(-Vx 2 + 2 / 2 ). 

The textures in Fig. 6 also suggest the limitations of the 
shaped point process method, and cf spectral methods in 
general. The phase spectrum in a spectral synthesis 
method is that of the driving noise, which is random. 
Thus spectral synthesis cannot produce a coherent-phase 
texture such as a brick wall pattern. In fact, given a step 
function for the kernel h , the shaped impulse process will 
result in a / ~ 2 noise - the spectrum of the kernel is 
reproduced but the visual appearance is quite different. 

The grey levels in a texture photograph reflect the illumi¬ 
nation of the texture and may not directly correspond to 
‘physical’ properties of the texture such as color or relief 
depth. Thus, a texture synthesized from a photographic 
sample will reproduce the spectral character of the tex¬ 
ture as illuminated rather than as we perceive it. One 
common effect is that sharp cast shadows produce discon¬ 
tinuities in the texture kernel and so introduce / " 2 noise 
into the synthesized texture. 

Unlike subdivision constructions, a shaped point process 
noise has definite inner and outer scales. The autocorre¬ 
lation (1) is zero beyond the width of the kernel, so the 
noise is uncorrelated at scales larger than this width (this 
can be seen in Fig. 6 as the scale at which the textures 
become “blotchy”). The inner scale is of course the 
Nyquist rate determined by the (fixed) sample rate of the 
noise. The bandwidth available in a shaped point process 
is nevertheless considerably greater than that available in 
many artificial texturing methods (e.g. [19]) and is ade¬ 
quate for many purposes, since a stochastic model will 
rarely be applicable over a very broad range of scales in 
any case. Also, some phenomena such as waves, fire, and 
bark which might be modeled by stochastic methods are 
often fairly smooth above and below a range of scales. 

4. Non-Gaussian Noises 

By a loose version of the central limit theorem, the pro¬ 
bability density of a noise produced with spectral syn¬ 
thesis will tend to be Gaussian regardless of the density 
of the driving noise, since the spectral shaping operation 
is effectively a linear filter or a weighted sum of the input 
noise values [15]. It is sometimes desirable to model 
non-Gaussian processes. For example, with respect to the 

uniform or normal distributions, a distribution such as 
exp(-1 x | ) has an increased number of ‘events’ far 
removed from the mean. Transforming a Gaussian noise 
to have a higher-variance non-Gaussian distribution tends 
to differentially exaggerate the most pronounced portions 
of the noise and so can produce the impression of a ‘sub¬ 
ject’ against a background, or of a non-station ary noise. 
Some of the published fractal landscape pictures depict 
fractional noises passed through a square or cube non¬ 
linearity which improves their appearance. 
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The probability density of a random process can be 
shaped by means of a memory less nonlinear transforma¬ 
tion g(x). For this purpose it is sufficient to consider 
only monotonically increasing g(x). Then, by “conserva¬ 
tion of probability”, the probability of an event y < y 
where y = g (x ) is identical to that of the event x < x : 

F y (?(*)) = P{y < y} 

= F x( X ) = P{* < x } 

or 

f y (y) — / 

SO 

,(*) = .F y -* [/*,(«)] 

Two cases are particularly useful. When x is uniformly 
distributed in (0,1), /^(a;) = x so the nonlinear function 
g(x) which shapes a uniform noise x to have a desired 
distribution F y is just g =F~ l . When the desired distri¬ 
bution F y is uniform, g = F x . Thus, the procedure to 
transform a noise to have a desired distribution is to first 
pass the noise through its own distribution function to 
make a uniform (0,1) noise, and then use the result to 
index the inverse of the desired distribution function. 
Both of these operations can be implemented by table 
lookup for reasonably smooth functions, so distribution 
shaping can be very efficient. 

4.1. Effect on spectrum 

The nonlinearity which shapes the distribution can also 
have a powerful effect on the spectrum of a correlated 
noise, however. This can be appreciated by considering 
the potential effect of a nonlinearity g(x) on a single 
“frequency component” cos (c ot ). By choosing 
9( x ) = / (cos _1 (z)), an arbitrary periodic waveform 
f (wt ) is produced at the output of the nonlinearity 
given the single frequency as input. The envelope of the 
spectrum at the output of the nonlinearity also depends 
on the amplitude of the input signal. A signal passed 
through a nonlinearity does not obey either the superposi¬ 
tion or homogeneity principles of linear systems, so the 
effect of a nonlinearity on a noise cannot be analyzed as 
the superposition of its frequency components. 

A general expression for the autocorrelation function at 
the output of g(x) is [20] 

R ( T ) = //y (*l)?(*2)/ x( x l’ X 2< T ) dx l dx 2 
where / x is the second-order joint probability density of 
the input. The spectrum of the output is the transform 
of this. However, this integral is difficult to evaluate and 
analytic solutions are known only for some special cases, 
including various cases where / x is Gaussian. Beckmann 
[20] gives a expression for the distorted correlation func¬ 
tion of a Gaussian noise as a series involving weighted 
powers of the input autocovariance. The output spec¬ 
trum is the transform of this series, which by the modula¬ 
tion (or convolution) theorem is a weighted series of 
n th-order self convolutions of the input spectrum. This 
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effect is illustrated in Fig. 7 . In theory it should be pos¬ 
sible to design the spectrum of the undistorted noise so 
that a desired spectrum is achieved after distortion, but 
this approach has not been formulated to the author’s 
knowledge. 

We conclude that nonlinear distortion is a powerful 
means of generating correlated non-Gaussian noises. 
However, this approach should be used carefully if accu¬ 
rate control of the spectrum and probability density are 
required. For example, some of the “fractal Gaussian” 
terrains we have seen are probably neither Gaussian nor 
of the attributed spectral exponent or fractal dimension 
as a result of squaring or other nonlinear distortions (e.g. 
a squared Gaussian noise has a one-sided probability den¬ 
sity 

fy{y) = 7W e V/2 ’ y ~° 

which is quite different from the Gaussian density). 


5. Conclusion 

Two spectral methods for stochastic synthesis were 
described. Spectral approaches allow the synthesis of 
noises with arbitrary power spectra, and so can describe 
both narrowband deterministic-like noises such as [19] 
and broadband random noises such as fractals, as well as 
noises which exhibit a mixture of structure and random¬ 
ness. 
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Fig. 2 : Planar quadrilateral subdivision mesh using a 4 2 neigh¬ 
borhood. The vertices ‘o’ and ‘x’ are estimated using the sur¬ 
rounding ‘observation’ points 








Fig. 1: (top to bottom) Stages in generalized subdivision to a 
non-fractal (oscillatory) noise. 


Fig. 7 : Bandpass noise and spectrum (lower figures) and noise 
and spectrum at the output of a pair of nonlinearities effecting 
a hyperbolic probability density. The self-convolution of the 
input spectrum produced by the nonlinearities results in an 
odd-harmonic spectrum structure. 
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Fig. 3 : Several textures produced using the generalized subdi¬ 
vision technique. Clockwise from top left: Markovian, oscilla¬ 
tory (shaded as an obliquely illuminated height field), Gaus¬ 
sian, and highpass isotropically oscillatory textures. 



Fig. 4 : Generalized subdivision terrain with an isotropic auto¬ 
correlation R(x,y) = exp(-(z 2 -fy 2 ) 0 ' 7 ). This figure resembles 
a power-modified fractal terrain but it can be distinguished (in 
being smoother) in a comparison. 



Fig. 5 : Synthetic sky and terrain with directional trend pro¬ 
duced with generalized subdivision. 



Fig. 6 : Several shaped impulse process textures. Counter¬ 
clockwise from top right: rough waves (shaded as an obliquely 
illuminated height field), fieldstone, and straw synthesized 
from Brodatz [18]. The last figure is a perspective view of a 
wave-like text lire. 
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Interactive 3-D Modeling with Personal Computers 
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Abstract 

An interactive three-dimensional modeling system, called 
Facet , is presented. Facet was developed for use during the 
design of physical artifacts: from studies of simple shapes, 
spaces, and forms to detailed modeling of complex physical 
systems. Facet provides a visual medium for enhancing 
perception of the model and for exploring various 
alternatives during the design process. In order to make 
this 3-D modeling medium available and affordable to any 
interested design professional Facet operates on commonly 
available IBM or compatible personal computers. In 
addition to an overall description of Facet and its 
interactive modeling techniques, several underlying issues 
in the development of a modeling system are discussed. 
These issues include: the interactive user interface; 3-D 
display techniques; and ways of dealing effectively with 
model complexity. A number of areas for continued 
development, based on experience with the current 
implementation, are discussed. 

KEYWORDS: configurable, interactive, three- 

dimensionai, modeling, computer graphics, personal 
computer, design. 

1. Introduction to Facet 

Facet is an interactive 3-D modeling system for design 
professionals that runs on widely available personal 
computers (PCs). It allows a user to develop and 
manipulate 3-D computer models of shapes, spaces, and 
forms for use during the design process. Models are 
displayed on the screen in a wire-frame form while the 
user is interactively building or modifying the model. 
Selected views (including perspectives) may subsequently be 
rendered by the computer with shaded surfaces or may be 
output with a pen plotter or dot matrix printer. These 
two-dimensional pictures can then be the basis for working 
drawings or other enhanced renderings to be used for 
communicating the design to others. 

Facet is initially directed to spatial designers who need to 
study alternative forms and shapes during the design 
process and create visual representations of physical 
systems. However, it will be readily adapted for use in 
other diverse areas of 3-D modeling such as generating 
input for graphic arts illustration systems as an aid in 
producing perspective drawings and renderings; building 
models, sets, and backgrounds for use by a 3-D animation 
system, when the elements being designed are static and do 
not require the expensive hardware necessary for real-time 
dynamic manipulation; and generating charts, technical 
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illustration, and other graphics to be incorporated with 
word processor output for document production. 

A careful combination of features positions Facet as a 
readily available tool for 3-D modeling integral to the 
design process: a need that has not been adequately 
addressed affordabley in the past. Some of the key aspects 
of Facet which recognize this need follow. 

• Perhaps the most important aspect of Facet is the 
quality of interaction. The emphasis is on providing a 
modeling aid for use during the initial design stages. 
This is accommodated to a large degree in the way the 
designer is able to effectively interact with this 
electronic modeling medium. For example, the 
designer is able to quickly develop models of shapes, 
spaces, and forms; examine the model from many 
points of view to establish an accurate mental 
perception; and easily make changes in exploring 
different alternatives. 

• A second key aspect of Facet is that it can deal with 
complex models. Unlike some previous modelers, 
project complexity is not limited by the computer’s 
address space. Features are provided that enable the 
designer to organize the model with structure, 
grouping, and symbolic naming to work with 
abstractions and deal effectively with complex or 
detailed project models. 

• Third, the ability to generate high quality shaded 
surface renderings, after interactively constructing the 
model, is an integral feature of Facet. The designer is 
able to assign various attributes to elements of the 
model that allow Facet to exercise the full capabilities 
of a high quality display device, if available, yet still 
operate effectively when using less capable display 
hardware. 

• Finally, low cost is a significant feature of Facet to 
make it feasible for use by any interested professional 
designer. The basic personal computer system to 
support Facet can be purchased today for as little as 
$2,000. (A much faster, full-featured system with the 
ability to generate full color renderings can be 
assembled for less than $10,000.) Marketing strategy 
and pricing for the Facet software have not been 
finalized. 

To be available to the largest group of users Facet runs 
under the PC-DOS (MS-DOS) operating system on IBM (or 
compatible) PCs. The minimum hardware configuration 
includes the basic PC with 640K bytes of main memory, a 
math coprocessor, a hard disk, a supported graphics display 
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subsystem and a locator (e.g. mouse). MS-DOS computers 
with faster or more powerful processors than the standard 
PC, but not as compatible, are also supported by Facet as 
long as an appropriate graphic display and locator are 
used. Any such increase in power will apply directly to 
improving the speed of interaction, for example, when 
calculating and displaying a new view of the model. 

Facet supports several commonly used graphics devices 
including IBM’s Color Graphics Adaptor and Enhanced 
Graphics Adaptor and the Microsoft mouse. Full color 
display systems (such as Number Nine’s 512 X 32) are 
supported for smooth shaded color renderings. A variety of 
hardcopy devices are supported, including pen plotters, dot 
matrix printers, and film recorders. 

2. Model Editing Techniques 

Models in Facet are represented geometrically as polylines 
(one or more connected line segments) and polygonal 
surfaces. In addition, polygons are often joined together 
(sharing common edges and vertices) to form polyhedral 
shapes. Although curved lines and surfaces are not 
represented specifically, they are readily accommodated 
with the use of approximating polylines and polyhedra. (In 
particular, built-in primitive shapes are provided to quickly 
model arcs, domes and other common objects.) There is a 
display attribute that may be specified for such 
approximating polyhedra that causes them to be smoothed 
together (using Gouraud or Phong shading) when 
subsequently rendered with shaded surfaces. 

Although the representation of models in Facet is fully 
three-dimensional, we are limited by our display hardware 
to visible windows on the model that are 2-D planes of 
projection. Likewise, we are limited by our pointing 
hardware to show locations in a single plane at any given 
time or to select or pick existing elements of the model by 
pointing to their displayed location on the screen. Facet 
employs two primary concepts in its operations for 
constructing and modifying models that minimize these 
inherent limitations: 

• The first is to provide a current plane in 3-D space on 
which operations may be performed. For example, 
detailed surface features may be drawn on the face of 
an object after positioning the current plane as 
required. This current plane is readily positioned and 
oriented, as desired, and is displayed graphically on 
the screen with the model. For most operations in the 
current plane the user will find it convenient to select 
the orthographic projection with the current plane as 
the picture plane: making the surface of the display 
screen congruent with the current plane. 

• The second concept is to do operations relative to the 
position or shape of existing elements in the model. 
For example, an object may be rotated to align with 
an edge of another existing object; the existing edge is 
readily picked by pointing at it on the display. 

2.1. Current Plane Editing 

A common strategy for constructing models with Facet is 
to begin by defining lines and polygons in space by working 
in selected positions of the current plane. Elements may 
then be connected or combined with eachother by picking 
from the existing vertices to form additional lines or 
polygons. There is also a powerful extrusion operation 


with which the user can sweep existing lines and polygons 
through space to form polyhedra whose cross sections are 
defined by the lines and polygons being extruded. 

Work in the current plane is normally done graphically 
by pointing at desired locations on the plane with the 
locator (e.g. mouse). To provide an intuitive capacity for 
drawing on the current plane an orthographic view, with 
the current plane as the picture plane, is normally used so 
that the current plane is congruent with the display screen. 
The user can zoom the display in or out and pan around to 
focus on desired areas of the current plane. There are 
some unique aids to quickly and accurately select desired 
locations in the plane. These include: 

• multiple grids of coordinates to which points may be 
automatically latched (with graphic definition of, and 
graphic feedback on, the operation of the grid’s 
gravity field). By using two grids the effect of a 
rotated grid may be obtained. 

• sets of slopes to which lines may be automatically 
snapped; and 

• background graphics (derived from other views of. the 
current model, or other Facet projects) with which 
points may be automatically aligned. 

The coordinate locations generated by any of these 
automatic latching aids are, of course, calculated to the full 
resolution (32 bits) of the model’s world space and are not 
affected by the resolution or zoom factor of the display 
being used. These aids may be grouped together as an 
environment to be saved under a user specified name for 
later use in the same plane or other planes or even in 
different Facet projects. 

Most of the model change operations, such as moving, 
rotating, and changing size or shape, may be performed 
with either absolute locations or measures (e.g. selected 
graphically in the current plane) or with locations or 
measures relative to existing elements of the model. 

2.2. Extrusion 

A flavor of how Facet works may be gained by following 
the steps of an extrusion operation. Extrusion takes groups 
of polygons or edges and sweeps them through space to 
form new polygons. The sweep proceeds in a specified 
direction and is terminated by a specified plane. 
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Figure 2-1 shows a simple building envelope being 
modeled in Facet ; the pedements of the two projecting 
wings have been drawn by placing the current plane at an 
offset from the front of the wings. The user would like to 
extrude these triangles into the main roof to complete the 
form. First the direction of extrusion is specified. In this 
example the user picks an edge in the projecting wing to 
define the extrusion direction. Second, the user defines 
the terminating plane. In the example the user picks three 
vertices that define the near slope of the existing roof over 
the main wing of the building. 2 Finally, the user selects 
the elements to be extruded. After the two pedements 
have been selected the extrusion takes place with results 
shown in figure 2-2. 



Figure 2-2: Extrusion Completion 

The direction of an extrusion may also be specified by a 
polyline to create a multiple step extrusion. Here the 
terminating plane for the intermediate changes in direction 
at the polyline vertices is the average between the planes 
normal to the two adjacent edges of the polyline. An 
example of multiple extrusions is shown in figure 2-3 which 
shows a circle that has been extruded along the top of a 
building to form a pipe. 

2.3. Programmable Commands 

Incorporated in Facet is a command processor that 
permits a user to add new operations to customize the 
system. This goes beyond the macro capabilities of other 
systems to permit operations on the model representations 
that are not available as interactive commands. Most 
modeling systems such as Facet are organized to 
accommodate a large class of geometric shapes for parts, 
but the user is really working within a discipline that 
imposes restrictions on this class of shapes for a given type 


general, the direction of the extrusion may be the vector 
determined by picking an existing edge or two existing vertices or 
alternatively as the direction perpendicular to a plane or two edges 
(here it is really the cross product of the two edges). 

likewise, the plane terminating an extrusion may be specified in 
one of several different ways: first, by using the current working 
plane; second, a plane that is parallel to but offset by a distance from 
the current plane; third, by picking three vertices in the model that lie 
in the desired plane; and fourth, the plane that passes through a 
selected vertex and is perpendicular to a selected edge. 



Figure 2-3: Extrusion along a Polyline 

of part. Knowledge of such restrictions may be used to 
advantage in the definition of operations provided for the 
particular type of part. Operations restricted in this 
manner make it easier to construct valid examples of such 
parts. For example if a particular rectangular solid 
represents a wall, the different compositions that may be 
constructed -by the user using that rectangular solid should 
be restricted to those that are possible with walls. 
Restriction of the possibilities that may be modeled during 
the design of an artifact makes the modeling task faster [4J. 
In order for a modeling system to speed the task of the user 
it should take advantage of any inherent restrictions and 
define the manipulation operations so they reflect those 
that may occur on the actual artifact. 

3. Flexibility in User Interface 

Perhaps the most important aspect of Facet is the quality 
of interaction. The emphasis is on providing a modeling 
aid to the designer for use during the initial design stages. 
This is accommodated to a large degree in the way the 
designer is able to effectively interact with this electronic 
modeling medium. For example, the designer is able to 
quickly develop models of shapes, spaces, and forms; 
examine the model from many points of view to establish 
an accurate mental perception; and easily make changes in 
exploring many alternatives. 

The user’s interface to Facet's operations is at the same 
time easy to learn and effective in use for the expert. All 
operations may be accessed through mouse activated menus 
(which may be restructured by the designer to fit any 
particular desires). In addition, operations may be bound 
to any of the keys of the keyboard for invocation at the 
push of a button. Any operation may also be accessed by 
explicitly typing in a unique abbreviation of the operation’s 
name or selecting it from a scrollable window containing 
descriptions of all the operations. 

The style of menuing used is pull-down or drop-down , 
similar to those used in several recent integrated operating 
environments (such as Apple’s Macintosh [7]). The 
operations are grouped into several small menus according 
to some similarity of use and each menu has a name. Small 
groups of these menus are clustered together to be 
accessible to the user simultaneously. Each operation is 
represented in the menus by a descriptive phrase (i.e. 
menu button). Normally, the display screen is filled by the 


Graphics Interface ’86 Vision Interface ’86 



183 


window displaying the 3-D model except for a single line of 
text at the top containing the names of the menus available 
in the currently active cluster. By pointing to a menu 
name, the user pulls down the menu whereupon the list of 
operations available in that menu appear immediately on 
the screen. The user drags to the desired operation to 
select it and, on doing so, the model display window is 
immediately restored and the selected operation begins. 
Within the individual operations, additional input 
requested of the user is performed graphically, whenever 
appropriate, by pointing to locations in the current plane , 
by picking existing elements of the model, or by answering 
a question from a multiple choice dialogue menu. In 
addition to activating a particular operation, selecting an 
item from a menu can also be used to effect a change to a 
different active menu cluster or a change in the contents of 
a pull-down menu. 

There has been a strong commitment, beginning with the 
initial designs for Facet , to a configurable user interface. 
This is primarily embodied in the ability of the users to 
define their own set of command menus. In the first place, 
the users may define the names used in the menus to 
represent the commands. Second, they may define the 
grouping of commands to form menus and the names of the 
menus. Third, they may define the clustering of the 
menus. Finally, the dynamics of when the clusters and 
menus change may be fully customized. Configurable 
interaction is provided in order that the designers may 
tailor the interface according to their specific needs and 
desires since each designer might evolve an uncommon 
approach to design using this modeling medium. In 
addition, an individual designer will find it useful to 
organize the interaction differently for varying types of 
projects. 

Brown [1] has defined a methodology that might be used 
to specify the structure of a menu network. His approach 
is not different from the one used in Facet. Both 
approaches define the structure using a predefined syntax 
but Facet does not provide any mechanism for explicitly 
controlling the complexity of the menu structure. We did 
not wish to impose, at the time of implementing the menu 
package, any untested assumptions about what the 
desirable properties of a menu structure would be for our 
applications. Because of this uncertainty about what 
constitutes “good” structuring we provide one low-level 
structuring tool that in Brown’s terms is the GOTO 
statement. 

If the user is going to be permitted to repartition his 
menus and order them to his liking there must be some 
discipline imposed on the structure of the application 
program. This may be done through the syntax of the 
menu structure definition. It is well know that these 
problems of structure definition exist and many people 
have proposed solutions [6, 8]. Facet is implemented using 
yet another solution which will be described in more detail 
in a future paper. 

4. Model Display Techniques 

As a physical system modeling tool (which might often be 
used during the design process) it is important for Facet to 
provide the user with a strong perception of the shapes and 
forms being modeled. This visual perception is readily 
gained through the ability to look at the 3-D model from 
any desired point of view. Facet displays selected parts of 
the model (the working set, as presented in the following 


section) on the screen as a wire frame (i.e. with the edges 
of each polygon drawn as lines) while the user is 
interactively constructing or modifying the model. The 
user is able to select a view graphically by pointing to the 
desired location of both the viewing position and the center 
of interest. Alternately, orthographic views may be 
selected with the picture plane placed in the current plane 
(which may be readily positioned to any desired location in 
3-D space). 

Selected views may be saved for later use in view files. 
Although the specifications stored in such files represent a 
2-D projection of the 3-D model, they are not specific to 
the particular display screen in use when they were saved 
in order that the views may be subsequently displayed on 
other devices including pen plotters and dot matrix 
printers. In addition to the location of the elements in the 
2-D view, the view files also contain depth information so 
they can be displayed with polygons rendered as shaded 
surfaces with any hidden surfaces removed. Also, view 
files can be automatically regenerated, on request, to 
update a previously saved view after the model has been 
modified. 

5o Effectively Dealing with Model Complexity 

Facet is able to accommodate large complex models. 
However, complexity by itself is of no advantage: it 
degrades the speed of interacting with the model and leads 
to visual clutter, logical confusion, and chaos. In order for 
the user to deal effectively with complex models, Facet 
provides several mechanisms for structuring and organizing 
the elements of a modeling project. Facet allows the user 
to develop abstractions and aggregations to work with 
selected aspects of the model in isolation from other 
aspects. In Facet there are several basic mechanisms for 
organizing the model: the family tree , collections , 

symbolic naming, and the working set. Some CAD 
systems, usually those based on automated drafting and 
used primarily to generate 2-D drawings, provide a simple 
method of partitioning a project (i.e. drawing) into several- 
separate entities (usually called layers or overlays ), a 
technique that is derived from traditional drafting methods. 
In order for a 3-D modeling tool, such as Facet , to be used 
effectively with complex projects, richer ways of organizing 
the modeling medium are required. 

Complex models are at the same time a necessity and a 
burden. Models of complex artifacts are made of many 
discrete components that often have overlapping conflicting 
design criteria. It is often desirable to accurately model 
many of these components and features. On the other 
hand, we want to limit the complexity to something that is 
manageable and still useful as an aid to analysis and 
communication; time and cost will also put a limit on the 
desired complexity. 

5.1. Abstractions and Aggregation 

A useful rule in balancing between the opposing needs for 
complexity and simplicity is to only model enough detail to 
adequately predict the performance of the artifact and to 
adequately communicate the model to others. Even with a 
conscious effort to limit model complexity the level that the 
human user is able to deal with effectively is quickly 


°The performance required of an artifact might be to satisfy some 
form of aesthetic evaluation as well as structural, functional, or other 
types of more readily calculated performance. 
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exceeded. We must accommodate this overload bv learning 
to make abstractions and aggregations. 

All models that fall short of embodying the emulated 
artifact itself are in one way or another abstractions. 
When abstractions eliminate the need to deal with details 
that are irrelevant at the moment, the user is free to 
concentrate on a few related criteria at a time. For 
example, two simple boxes of appropriate proportions 
might be used as an abstraction of the World Trade Center 
to do site positioning studies. Often, it is helpful to 
abstract an abstraction and so on, establishing many layers 
or a hierarchy of abstraction. 

With aggregation we can group items similar in feature, 
purpose, or design, allowing us to work with the group 
efficiently, in isolation from other items and groups, and 
apply operations to the entire group in a like fashion. For 
example, advantageous groupings in a building might 
include the structural, electrical, or mechanical 
components; spaces serving a similar function, such as 
stairways; or different components from the same 
manufacturer. 

6.2. Model Organization in Facet 

The previous requirements drove the development of the 
four organizational mechanisms provided by Facet. These 
are the family tree, collections, symbolic naming, and the 
working set. 

The family tree allows a hierarchical organization of the 
elements of a model. The nodes of the tree are called 
members with each having a parent member, a set of zero 
or more sibling members (all having the same parent), and 
a set of zero or more child members. At the base of the 
family tree is the root member that does not have a parent, 
is an ancestor of all other members in the family tree, and 
is provided automatically in a new project. The family 
tree is inherently a spatial organizational mechanism. In 
the first place, all parts, or spatial elements, of the model 
exist within the hierarchy of the family tree. In addition, 
the content of each member in the family tree (other than 
the hierarchical structuring information) is a location, 
orientation, and scale in 3-D space that is applied to an 
optionally referenced part description (called a prototype in 
Facet). Each prototype may be referred to by one or more 
members allowing many discrete instances of similar parts 
with only a single description being stored. Finally, the 
location/orientation/scale of a particular member is not 
defined in absolute terms within the 3-D world, but is 
relative to that of its parent in the family tree. Thus, an 
operation performed on a particular member may also 
affect all descendants of that member. For example, 
moving a member to a new location may be done in a 
fashion that effectively moves all descendants in a like way 
so that the relative positioning of the member and its 
descendants remains the same. 

A second mechanism for grouping members, without any 
implied spatial connectivity, is provided with collections. 
Collections offer a means of assembling aggregations of 
members for any desired purpose in a manner orthogonal 
to their structure within the family tree. A member may 
be included within several different collections or, on the 
other hand, is not required to be in any collection. Other 
collections may be included in a collection, allowing 
hierarchies of collections. 


The third mechanism, symbolic naming , allows the user 
to associate a name (any string of up to 60 characters) with 
a particular member or collection. Selection of such a 
component may be done by specifying its name. Facet will 
also, in certain situations, identify components to the user 
by displaying their names. 

The last organizational mechanism is a dynamic sub-set 
of the project model, called the working set. This is the set 
of components that the user has selected to work with 
together at a particular point in a modeling session. All 
the components in the working set are displayed on the 
screen and are readily accessible by the user through the 
Facet operations. By the same token, components not in 
the working set are not displayed and are not generally 
accessible without placing them in the working set. There 
is a set of operations in Facet , collectively called the project 
browser , that allows the user to scan through the entire 
project model and move components into and out of the 
working set. The project browser permits the user to 
search through any accessible project and copy parts of 
other projects into the current one. 

6.3, Modeling Medium Limitations 

In addition to addressing the conceptual difficulties of 
effectively working with complex models, we must also deal 
with the physical limitations of our modeling medium. For 
systems with an electronic computer performing an 
essential role, such as Facets the principal limitations are 
the memory size and processing speed. The size of memory 
regulates the amount of model information that may be 
displayed and manipulated simultaneously. The processing 
speed affects the smoothness of interaction, e.g. when 
redisplaying the model from another viewpoint, selecting an 
item, or performing a spatial modification. The amount of 
processing required for such operations is directly 
proportional to the complexity of the model on display. 

Some modeling systems choose the route of limiting a 
project’s size so that it will fit into the computer s main 
memory(examples of this are Design Board Professional by 
Mega CADD Inc. [3] and Polycad/10 by Cubicomp 
Corp. [2]). These might be called part modelers as opposed 
to assembly modelers. Others use a simple form of virtual 
or paged memory, pretending that the main memory is 
larger than it really is and storing it instead on the slower 
disk memory(examples of this include Microcad by 
Imagimedia Technologies Inc. [5)). As in any system that 
saves project models from one session to the next, Facet 
stores its models on disk. Facet does not limit the project 
by the size of main memory. It uses the organizational 
mechanisms described above to allow the user to select 
parts of the model to be operated on together at any given 
point (perhaps a set of abstractions or aggregations). This 
working set is then resident in main memory and does not 
incur the slowdown of continually reading and writing the 
data on disk as in a demand paged memory scheme, that is 
not able to take advantage of the user’s logical organization 
of the project data and working patterns. The user will 
normally keep the working set as small as possible, 
including only those components relevant to or providing 
context for the current operations, to minimize the 
processing time for interactive operations. Facet includes a 
project browser with that the user may alter the working 
set (by adding or removing items) at any time during a 
modeling session. 
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8. Future Directions for Facet 

Although Facet is now a complete system, the area of 3-D 
modeling is very wide ranging and we have several 
additional enhancements in progress and planned for 
development over the next several years. In the area of 
interactive model construction and editing techniques 
developments that are planned include: 

• Suites of application specific modeling operations and 
primitive objects (ultimately user defined primitive 
objects provided through application of a 
programming facility). 

• Additional general purpose modeling operations, e.g. 
patterned replication, polygonal meshes, spatial 
reconstruction (digitizing 3-D shapes from multiple 2-D 
views). 

• Use of image digitization from hardcopy, film, video 
tape, or video camera, for use as surface texture on 3- 
D model elements or as backdrops for the model where 
the synthesized perspective projection of the project 
model is matched to that of the scanned in scene. 

• Interactive rendering enhancement with paint system 
type of techniques and 2-D images as backdrops. 

• Solid modeling - using a winged-edge type of 
polyhedral boundary representation currently running 
under Unix (with robust spatial set operations). 

In the area of general interactive control and menuing, 
we will be experimenting with ways of providing a 
display able map of the entire menu structure, with the 
active menu cluster highlighted, to provide an effective aid 
to navigating a complex network. 

The next step would be to provide an interactive menu 
structure editor that operated directly on the displayed 
graphic representation. 

Continuing developments are planned in dealing with the 
graphic display and access of Facet model data. Several 
enhancements to the shaded surface rendering are under 
way. To transfer graphic data to drafting systems, paint 
systems, and other picture enhancement programs, output 
will be provided in IGES and widely used proprietary 
formats. (Currently a library of routines is provided for use 
by client programs that allow direct access to the Facet 
Disk Data Structure.) The current simple form of 
interactive animation set up provided to interpolate camera 
views through key poses will be enhanced to allow more 
flexibility of movement. With the use of new high 
performance hardware display processors (both on the PC 
and other hosts for Facet) the playback of these animated 
sequences will be speeded up to near real-time; interactive 
operations will also benefit from increased dynamic motion 
and speed of redisplay. 

7. Conclusions 

In Facet we attempted to provide a modeling tool that is 
accessible to many designers during the conceptualization 
of a project. We have done this by presenting operators 
that make generation and change of models fast and 
simple. Use of the system has shown that it has the depth 
and flexibility to easily adapt, extending the habits and 
design processes of an individual designer. 
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Psychology and the User Interface: 
Science is soft at the frontier 

(Abstract of invited talk) 


John M. Carroll 
User Interface Institute 
IBM T.J. Watson Research Center 
Yorktown Heights, NY 10598, USA 


One source of intellectual overhead that every science in¬ 
flicts on itself periodically is the clarion call to be hard , 
to establish methodological ground rules so severe that they 
will insure that good science can prevail. This romantic 
notion would only be that if it were not for the fact that 
these fits of methodological purification have typically led 
to conceptual and empirical poverty. The excesses of 
positivism and its crippling effects on the sciences from 
physics to psychology are still in recent memory. 

Newell and Card, in an invited article in the journal 
Human-Computer Interaction, have undertaken a modem 
variant of this methodological cleansing. However, in most 
respects their motivation and arguments are precisely those 
of the positivists. They urge that the psychology o 
human-computer interaction needs to be hardened, meaning 
it must more uniformly subscribe to parameter fitting, cal¬ 
culation, and quantitative approximation. They are exphcit 
in identifying as their motivation the fear that the harder 
disciplines of user interface design and artificial intelligence 
will not take usability psychologists seriously unless the 
psychologists have hard methods. They suggest a modified 
Gresham’s Law: "hard science drives out the soft, as if this 
is both inevitable and a good thing. 

My own view is that science is always soft at the frontier. 
The psychology of human-computer interaction is at a 
frontier of method and theory in psychology and a frontier 
of technology and application in computer science. To me, 
it is fantastic to insist that we start right out on a hard 
psychological theory to guide designs for integrated co¬ 
authoring applications on workstations that support multi- 
media input/output when we can barely couch such a 
theory for well-worked, toy domains like cryptarithmetic 
and chess. 

Newell and Card are too concerned with the form of science 
and too little concerned with its content. They urge calcu- 
lation and quantitative approximation but seem almost blase 
about what exactly is calculated or approximated. At best, 
Newell and Card’s discussion is very premature; more likely, 
it threatens to set the psychology of human-computer 
interaction backward by confusing the project of developing 


a fundamental understanding of usability and user psychol¬ 
ogy with the engineering practices we might be able to de¬ 
velop if we had such a science base to begin with. 

This talk has four parts. In the first, I consider Newell and 
Card’s clarion call for hard science, reviewing a cntique de¬ 
veloped jointly with Robert Campbell of IBM Research. 
Campbell and I argue: (1) that Newell and Card misunder¬ 
stand and underestimate how psychology currently contrib¬ 
utes to interface design and thus set out to solve a 
nonexistent problem; (2) that they misunderstand and 
oversimplify the system design process, and that indeed only 
by doing so can they find a role in it for their clumsy hard 
science; (3) that their replies to existent criticisms of their 
hard science are uniformly without serious content. 

Their reply to the charge that their hard science is too low 
level is essentially to redefine “psychology” so that it per¬ 
fectly coextends with their enterprise, leaving critics to at¬ 
tack psychology and not them. Their reply to the charge 
that their hard science is too limited in scope is to try to as¬ 
similate a variety of current work (much of it not so low 
level) to their enterprise merely by saying “it fills out our 
‘vision’.” (Notably, these two replies, taken in conjunction, 
are self-contradictory). Finally, their reply to the charge 
that hard science takes too long to help at all in the devel¬ 
opment process is to say that the elaboration of interface 
technology in fact takes place more slowly than everyone 
thinks it does! 

In the second part of the talk I examine some of the current 
research work in human-computer interaction that is 
paradigmatically hard. I argue that the psychology of 
human-computer interaction, like psychology generally, 
suffers from a methodological bias for posing elegant, 
either-or research questions that idealize away variables like 
task context, e.g., "is mouse driven pointing control better 
than a velocity control joystick?" Perhaps the question 
should be: "under what circumstance is a mouse the right 
design choice, and under what circumstance is a velocity 
control joystick the right choice?" Hard psychologists seem 
too willing to trade off ecological scale for laboratory 
tractability (e.g., a study of command languages that exam- 
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ines a command set of 3 commands when realistic scale 
would be 1-2 orders of magnitude larger). The hard science 
of Newell and Card rests fundamentally on baldly unrea¬ 
sonable idealizations (e.g., assuming errorless performance 
for purposes of theory when in fact obtained error rates 
exceed 30 percent). 

We need to concentrate on the important facts of user be¬ 
havior, not ignore them because they lie outside our 
methodological purview. We must of course strive to make 
our science harder (in the usual sense of "more system¬ 
atic"). But we must also guard against too much weight 
being given to superficial rigor and too little to the practical 
value of our theories in guiding the design of new technol¬ 
ogy- 

In the third part of the talk, I examine the area of artificial 
intelligence research specifically directed at the construction 
of advisory expert systems (intelligent help and training fa¬ 
cilities). Newell and Card might find it startling that an 
domain in the mainstream of AI, which they describe as 
hard, in fact has no systematic theoretical foundations (no 
constitutive theories of types of general skill, no principled 
taxonomy of knowledge domains, no user models that do 
not obviously violate fundamental facts about human 
learning and the growth of knowledge). 

Indeed, this supposedly hard research area has no compre¬ 
hensive methodology: experimental systems are routinely 
designed with the paramount goal of providing advice to 
users without any systematic consideration of how people 
give and take advice, what their real problems, goals, or 
needs are, etc. The field has a large inventory of dialog 
techniques, for example, but no understanding of the cir¬ 
cumstances under which particular techniques are useful or 
of how to integrate various techniques to capitalize fully on 
prior work. Finally, there is no effective engineering aspect 
to this work: no one knows how to develop advisory expert 
systems with limited resources or on short schedules. It is 
simply incredible that anyone who understood the state of 
art in this field could hold it up as a paradigm of hard sci¬ 
ence. 

In the final part of the talk, I consider what human- 
computer interaction might need in the way of a soft sci¬ 
ence, a conceptually richer and methodologically less limited 
science. I urge that we recognize that in a rapidly evolving, 
technology-driven area hard science can never drive out the 
soft. Rather it consolidates those areas that have become 
well-worked. We must learn better how to use soft science 


to identify concepts and behavioral phenomena that are re¬ 
ally worthy of quantification and other "hard" analysis. 
We must develop an arsenal of realistic empirical methods, 
methods that efficiently and reliably produce information 
at the right level to impact the application of new technol¬ 
ogy, not merely at a convenient level. For example, col¬ 
lecting and taxonomizing users’ critical incidents or thinking 
aloud protocols may generate information more directly 
pertinent to an iterative design process than a record of in¬ 
dividual keystroke times — but the keystroke times are 
"hard," more convenient to collect, and more familiar and 
routine to analyze. 

We must develop qualitative theories, and means for ex¬ 
pressing such theories. For example, a list of user know¬ 
ledge states, with associated transition rules, may be more 
relevant to guiding the design of new technology than an 
equation describing a fitted curve of millisecond differences 
between performance means. Finally, we must extend the 
scope of our theories. For example, if users routinely make 
many errors, then our theories should incorporate errorful 
as well as errorless behavior. Empirical taxonomies of error, 
and even rough theories of action slips, abductive reasoning, 
and learning via metaphor and analogy are soft science, but 
perhaps critical if we are to have a serious and effective sci¬ 
ence of human-computer interaction. 

In summary, it is elementary in the history of science that 
one cannot legislate the quality of the conceptual and em¬ 
pirical content of science merely by legislating the 
methodological form. In fact, if history is any gauge, a priori 
limitations on acceptable methods usually have an under¬ 
mining effect on conceptual and empirical quality. Newell 
and Card are mistaken in their attempt to confine the psy¬ 
chology of human-computer interaction. Their view of hard 
science is arbitrary and in particular has been a fairly well- 
documented failure in providing real leverage in interface 
design, conceptually and empirically. Their view of AI as 
hard is similarly inaccurate, as evidenced by the subfield of 
advisory expert systems. Finally, there are routine alterna¬ 
tives to their hysterical and dismal clarion call. 

Gresham’s Law states that "bad money drives out the 
good", but it does not suggest that we accept this as our in¬ 
escapable fate. Rather, it suggests that we protect good 
money by responsible fiscal policies. I suggest that we pro¬ 
tect soft science by responsible methodological policies. 
Whenever a scientific program is championed on purely 
methodological grounds, we should cringe. 
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LEARNING GRAPHICS PROGRAMMING BY DIRECT COMMUNICATION 


Martin Tuori 
Tim Pointing 

Defence and Civil Institute of Environmental Medicine 
PO Box 2000 

Downsview, Ontario, M3M 3B9 


ABSTRACT 

The process of learning the graphics functions of a 
computer graphics workstation environment is both 
assisted and hampered by the presence of an intermediary 
programming language. Assistance comes in the form of 
programming language functions for preprocessing, 
storage declaration, expression evaluation, control flow, 
and system libraries. Working against the student, 
compilation of programmed examples is slow, and errors 
may arise both from the syntax and semantics of the 
graphics functions, and from those of the programming 
language. 

We propose an approach to learning the graphics 
functions that temporarily separates the graphics 
component from other aspects of the overall programming 
environment; in a sense, we are proposing training wheels 
for the graphics subsystem. 

This approach was used in creating, for the IRIS 
series of workstations, L a graphics interpretter that 
allows a student to test out graphics concepts, without the 
need to write programs. Subroutine calls typed to the 
interpreter are carried out immediately, allowing a quick, 
trial-and-error approach. We argue that this approach is a 
useful addition to conventional learning techniques, and 
that its success can be attributed to bringing the student 
programmer into more direct communication with the 
graphical components of the programming environment. 


1. IRIS is a trademark of Silicon Graphics Inc. 


INTRODUCTION 


A student learning the details of a new computer 
graphics programming environment may employ many 
different techniques. He may begin by reading the 
manufacturer’s documentation, which, if well written, 
conveys basic concepts, syntax, semantics and 
suggestions for efficient use of the computer graphics 
system. While this is an important stage in the student’s 
training, it is not enough to give him fluency as a 
graphics programmer. Writing small test programs is 
good way to proceed, and is made much easier if a 
sample skeleton program, or stub, is provided. As Duff 
says, “Whenever possible, steal code.’’ [Duff 1985]. The 
student can extend the stub to exercise individual features 
of the graphics environment, or combinations of features, 
thereby gaining familiarity with the concepts and 
behaviour of the system. 

A high-level programming language provides a 
variety of features that can help the student in his 
exploration of a system and its functions. Macro 
preprocessing serves two useful functions — common 
constants and expressions are provided in system files, for 
inclusion in new programs, and the student can define 
macros to suit his own needs. Storage declaration 
provides for complex object definition, and for loading 
them from external sources. Expression evaluation 
allows results from one operation to be used as input to 
another; for example, reading pixel values from a raster 
display, modifying and redisplaying them. Features for 
control flow allow conditional, iterative and recursive 
action. Finally, various support libraries for 
mathematical, input/output, networking and other 
functions offer specific functionality, as needed. 
Although the student can defer the use of some language 
features, such as specialized subroutine libraries, he 
cannot avoid the basic syntax and semantics of the 
programming language itself. 
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Many of the basic language features are of little 
interest, initially, to the student studying a graphics 
subsystem; rather, he needs to concentrate on the graphics 
subroutine calls. The programming approach is error- 
prone, tedious and time-consuming. The student’s efforts 
at writing even small, correct programs are invariably 
delayed by errors in the syntax or semantics of the 
programming language; these must be corrected by 
repetitive editing, compilation and testing. 

These problems arise because the programming 
approach is indirect. The student needs to test his skills 
in using the graphics functions, but is forced to do so 
through an intermediaiy, albeit high-level, programming 
language (Figure 1). If the programming language is 
interpretive (some implementations of Basic, Lisp, APL, 
etc.), test runs can proceed quickly; in many cases, 
however, the programming language is compiled (most 
implementations of C, Pascal, Fortran, etc.), and 
considerable time is spent waiting for compilation to take 
place. Since the student’s efforts are highly exploratory, 
characterized by tens or hundreds of trial and error steps, 
considerable time and system resources may be wasted. 


A DIRECT INTERPRETIVE APPROACH 


Recent literature on Human-Computer Interaction 
(HCI) has promoted the use of direct manipulation 
[Shneiderman 1983], [Kay 1984], [Hutchins, Hollan and 
Norman 1986], [Witten and Greenberg 1985]. 
Shneiderman characterizes direct manipulation by: the 
visibility of the object of interest; rapid, reversible, 
incremental actions; and replacement of complex 
command language syntax by direct manipulation of the 
object of interest. 

The situation here is different, in that there is no 
easily defined visible object of interest. The student is 
studying the process, or language of graphics 
programming. Perhaps the conventional approach, in 
which we use a complex command language (the 
programming language interface) should ultimately be 
replaced by more more visual, manipulative, or 
demonstrative programming methods. We are somewhat 
constrained, however, by the present state of 
programming support on graphics workstations; the 
student must learn to control a graphics system through a 
highly linguistic interface. Our objective here is not to 
introduce direct manipulation, but to offer direct 
communication between the programmer and the graphics 
library. 
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A student’s initial, exploratory efforts are better 
supported by a fast, interpretive interface (Figure 2). The 
student should be able to compose requests for graphic 
subroutine calls, and have them carried out directly, with 
immediate visual results. The approach is not new, but 
can be dated back at least to Turtle Geometry [Byte 
1982], [Papert 1980]. The work presented here is not 
intended to teach children to use computer graphics, or to 
teach them problem-solving skills; it is intended to teach 
the programming details of two- and three-dimensional 
shaded graphics in environments like the IRIS 
workstation [Silicon Graphics Inc. 1984]. 

A graphics interpretter should act as an additional 
tool in the student’s kit. As he progresses, he will need to 
try writing real programs; this transition is easier if the 
language of the interpretter corresponds, as closely as 
possible, with the style of programming that will 
ultimately be demanded of the student. Although there is 
a temptation to provide additional functionality in the 
form of high-level primitives for drawing, menu-driven 
interfaces, etc., this must be resisted, unless those 
primitives are part of the toolkit the student will later use. 
The objective here is not to create yet another language 
for graphical expression, but to mimic, as closely as 
possible, the existing graphical component of the high- 
level programming language. 


We have constructed an interpretter, for the IRIS 
workstation, called irisinterp, or ii. In ii the following 
sequence of commands produces a perspective view of a 
coloured box with a white top: 

perspecti ve(600,1,1,2000) 
makeobj(l) 

/* a tall red box *1 
color(l) 

polf(5, 0,0,0, 10,0,0, 10,0,40, 

0,0,40, 0,0,0) 

polf(5, 10,0,0, 10,10,0, 10,10,40, 

10,0,40, 10,0,0) 
polf(5, 10,10,0,- 0,10,0, 0,10,40, 

10,10,40, 10,10,0) 
polf(5, 0,10,0, 0,0,0, 0,0,40, 

0,10,40, 0,10,0) 

color(7) 

polf(5, 0,0,40, 10,0,40, 10,10,40, 

0,10,40, 0,0,40) 

closeobj 

color(0) 

clear 

lookat(45,45,50,0,0,l 5,1150) 

callobj(l) 



Figure 2: 

A More Direct, Interpretive Approach 
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Punctuation, and other syntactic details are relaxed in ii; 
trailing semicolons (signifying the end of a statement in 
C), parentheses for subroutine arguments, and commas 
are treated as white space. Readers familiar with the 
IRIS programming environment will recognize that, with 
a few changes in punctuation, this sequence could be 
turned into a C program to carry out the same function. 
In fact, it is part of a longer sequence to draw the 
coloured boxes example provided in the manufacturer’s 
documentation. With this sequence, the student can more 
easily follow the documented description of three- 
dimensional viewing controls, use of the z-buffer, etc. 

Our early experience with ii led us to extend the 
basic concept, somewhat, to include the following 
features. A script inclusion feature has been added, by 
which a file containing ii instructions can be called, for 
insertion into a sequence; for example, the following 
sequence performs simple animation by calling the boxes 
script, and then rotating it by 5 degrees about the z-axis: 


script(boxes) 

color(O) 

clear 

rotate(50z) 

callobj(l) 

color(O) 

clear 

rotate(50z) 

callobj(l) 

color(O) 

clear 

rotate(50z) 

callobj(l) 


This allows longer sequences to be prepared with a text 
editor, tested and refined. It also allows the development 

of a set of tutorial examples. Scripts may be nested to a 
pre-determined limit; but recursion is ineffective, due to 
the lack of a method for expressing conditional 
termination. Standard defined constants are provided, for 
boolean values, colours, and screen limits: 



makeobj 1 
it red box 
color 1 
polf 5 
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irisinterp: a program to execute iris graphid 

nterpretively 1 -7- 

type the name of an iris graphics routine# and appropriate argume 
nts as 

constants# and it will be called. All routines described 
RIS 

User's Guide Mith certain exceptions (type 'help' to get 
ii: color 0 
ii: clear 

ii: script boxes.lines 
End of script boxes-lines 


Figure 3: 

Using ii in a Multiple Window Environment 
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color(magenta) 

move(0,0,0) 

draw(xmaxscreen,ymaxscreen,0) 

We have resisted the temptation to add declarative and 
iterative capabilities, partly because of the implementation 
effort they would require, and because the student who is 
ready to use those features is ready to move on to 
programming in C. 

A few subroutine calls from the IRIS GL-2 graphics 
library are not supported in ii, because they are 
inappropriate in this context. For example, the subroutine 
califunc() requires the address of a C subroutine, to be 
called from within a graphical object; the student using ii 
has no way of determining such an address. Similarly, 
the subroutine defrasterfont() requires an array containing 
the bitmap definition of a raster font; the student cannot 
be expected to type in such an extensive data structure. 

The ii program has been useful in our efforts to 
explore and understand the IRIS programming 
environment. For example, details of the window-to- 
viewport coordinate transformation were initially 
confusing; trial and error with ii helped considerably. 
Interactions between z-buffer and double-buffered display 
techniques were also explored easily using the 
interpretter. As our skill at programming the IRIS 
increases, we still return to ii occasionally to check out 
details of some graphics functions. In the multiple 
window environment (MEX) of the IRIS, it is easy to 
digress from a programming task to try out an idea using 
ii, and then return with the solution in hand. An example 
is shown in Figure 3, in which a text editor (upper left) is 
being used on the boxes script, the ii program is being 
run from a partially obscured window in the bottom left, 
and the graphical output from ii is at the upper right. 

The development of ii was, not surprisingly, a 
tedious task. Data typing in C is sufficiently strong that 
subroutine calls, with arbitrary numbers and types of 
arguments, cannot be assembled and executed 
dynamically. It was necessary to group subroutines by 
their calling-sequence patterns, and use a common stub to 
assemble appropriate arguments and dispatch the request, 
as appropriate. For example, routines that take four short 
integers as input form one group, while those that take 
four pointers to short integers form another. In all, the 
source code for ii takes 30 pages; the compiled program 
is quite large, at 164 k-bytes, since it includes most of 
the GL-2 graphics library. 

The savings afforded by ii are significant. A short 
sample program, that draws a three-dimensional cube 
intersected by a plane, occupies 1,390 characters of text 
in ii, whereas the C source takes 2,119 characters. 
Compilation in C takes 37 seconds on an IRIS-2400 (no 
other users), and the compiled program is 61,440 bytes 
long. 


CONCLUSION 

In this paper, we have described a learning situation 
(graphics programming) in which the student’s efforts are 
hampered by the insertion of an intermediary, high-level 
programming language and its support environment. A 
direct, interpretive approach improves the speed of 
learning, by bringing the student into closer contact, or 
communication, with his objective — the syntax and 
semantics of the graphics subroutine library he is trying 
to learn. Direct communication is an adaptation of the 
concept of direct manipulation , for situations where the 
user’s objective is not a visible entity, but a linguistic 
process. 
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VLSI AND GRAPHICS AT THE PIXEL LEVEL* 
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Abstract 


The computational bottlenecks in many interactive raster 
graphics systems are the pixel-level calculations and not the 
display list traversals, geometric transformations or clipping 
computations. We examine several VLSI-based designs that 
focus on these pixel-level calculations, noting die influence of 
the price-performance target on system design and component 
selection. We describe in detail the latest results from 
Pixel-planes, our experimental system optimized for algorithms 
in which many of the pixel-level calculations can be formulated 
as linear expressions (Ax + By + C) of the pixel's x,y address. 
We outline variations on the current implementation for greater 
generality or faster speed or lower cost. We show the effects of 
these changes on the* algorithms that are to be run on the 
alternative implementations. 
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ABSTRACT 

The well-known “z-buffer” algorithm for solving the 
visible surface problem has a number of points in its 
favor, the main one being that it is amenable to very 
efficient hardware implementation at little additional 
cost in many existing frame buffer systems. The 
traditional software implementation of die algorithm 
assumes explicit initialization of both the image buffer 
and the z-buffer before each image is generated. This 
paper describes a simple technique for synchronizing 
initialization and image generation so the two can be 
performed in parallel, allowing complete overlap to be 
achieved and effectively eliminating the time needed 
for explicit initialization of the frame buffer. The 
technique assumes a modest investment in additional 
hardware within the frame buffer. 

RfiSUMfi 

Parmi les algorithmes de surfaces caches, Talgorithme 
du “z-buffer” a un certain nombre d’avantages k son 
actif. Le plus important de ceux-ci dtant que cet 
algorithme est peu couteux k imptementer au niveau 
hardware dans les systfcmes actuels de “frame-buffer” 
Les techniques traditionnelles d’impldmentation de cet 
algorithme forcent le logiciel k initialiser explicitement 
le “z-buffer” et le buffer image avant le transfert de 
Timage. Cet article ddcrit une technique simple qui 
permet de synchroniser l’initialisation et la creation de 
Timage afin qu’elles puissent 6tre rdalisdes 
simultandment. Ceci permet d’61iminer le temps perdu 
lors de Tinitialisation du “frame-buffer.” Cette 
technique suppose un faible investissement en matdriel 
additionnel k Tintdrieur du “frame-buffer.” 

Keywords: double-buffering, frame buffer, real-time, 
visible surfaces, z-buffer. 
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INTRODUCTION 

The problem of computing only the visible surfaces 
of a scene has by now been well-studied [12]]. One 
approach that has gained popularity is the z-buffer 
algorithm first described by Catmull [4,5]]. With a 
z-buffer the depth-sort required of a visible surface 
algorithm is accomplished by maintaining, for each 
pixel, a record of the z-depth of the object whose 
intensity (color) is stored at that pixel. As subsequent 
objects are scan converted into the frame buffer, their 
z-depths are compared and used to decide whether the 
new object is in front of or behind the object currently 
displayed at each pixel. In the former case both the 
intensity and z-depth are changed for the pixel, but in 
the latter case no action is taken. A complete 
explanation of the z-buffer algorithm appears in 
standard text books on computer graphics [7,10]]. 

In the following sections we first define our model 
of a frame buffer and then look at different ways of 
adding additional hardware to the frame buffer to speed 
up the z-buffer algorithm. The first approach almost 
doubles the amount of memory in the frame buffer and 
is presented solely to motivate the other two. The 
second approach adds only a single bit to each pixel but 
requires a more complicated memory controller that 
could lead to significant timing problems in the video 
chain. The third approach adds two more bits to each 
pixel and hence admits an implementation that requires 
no additional function in the memory controller beyond 
what is available in current frame buffers, although a 
modest change is still required to hardware further 
down the video chain. 

The basic z-buffer algorithm performs well in 
almost all respects except for considerations of 
antialiasing. This deficiency stems from the fact that 
the z-buffer maintains depth information on a pixel-by- 
pixel basis, and thus has no way to discriminate objects 
at subpixel resolution. This is unfortunate because 
aliasing artifacts introduced by the scan conversion 
process can be very objectionable in practice. Some 
researchers have suggested techniques to incorporate 
antialiasing strategies into a z-buffer algorithm, but 
those techniques either require auxiliary storage or 
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additional processing time beyond the basic scan 
conversion algorithm, or they fail to achieve the desired 
level of image quality [3,6,8]). The approaches 
described in this paper apply to z-buffer algorithms 
enhanced for antialiasing, although we do not address 
the issue explicitly. Instead, we adopt the attitude that 
an important application of the z-buffer technique is to 
high-performance raster systems for which real-time or 
quasi-real-time performance is desired and that proper 
antialiasing is a luxury as yet not consistent with that 
goal. 

We distinguish between the update rate , the rate 
at which new images are computed by the algorithm, 
from the refresh rate , the rate at which the computed 
images are displayed on the monitor. The update rate 
is almost always established by limits in processing 
power and the desire to display complex images and 
thus tends to remain a bottleneck even as advances are 
made in hardware and software, whereas the refresh 
rate is set by the need to overcome flicker effects 
inherent in the human visual system and can be 
regarded as a relative constant. 

Our definition of real-time will be that a complete 
image is generated within a single refresh cycle (usually 
1/30 or 1/60 of a second). The notion of quasi-real-time 
will imply image generation that closely approximates 
that rate (at worst an update happens every 1/5 of a 
second). Ron Baecker has already championed the 
claim that such rates give an acceptable illusion of 
continuous simulation if the update rate and the refresh 
rate are suitably synchronized [2J. To be effective, a 
steady refresh of the current image must be maintained 
throughout the scan conversion process for the next 
image. Because of this, we will be interested only in 
the double-buffered version of the algorithm in which a 
refresh processor displays one image while an update 
processor is generating the next image. 

With this set of ground rules, we are ready to 
discuss the performance of the z-buffer algorithm and 
to examine alternatives to the traditional 
implementation. The operations performed by the 
versions of the z-buffer algorithm that we will consider 
remain largely the same in each of the 
implementations. The differences lie in the way that 
the operations are partitioned between the two 
processors (the update processor and the refresh 
processor) and in the way that the two processors 
synchronize their operations. 

THE FRAME BUFFER 

Our model assumes that a frame buffer contains a 
large amount of pixel memory indexed by two- 
dimensional (x,y) addresses and that each pixel is 
divided into fields composed of a number of bits. For 
our purposes at least three fields are necessary in a 
pixel. These fields will be designated the I 0 and I x 
fields (two intensity buffers , one for the image currently 
being displayed by the refresh processor and the other 
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for the image being computed by the update processor) 
and the Z field (a single depth buffer). Frame buffer 
memory is dual-ported to allow the update processor to 
read and write pixels (randomly) at the same time that 
the refresh processor reads pixels at video rates (in 
scan-line order). The refresh processor passes the pixels 
down the video chain where the color information stored 
in each pixel is converted to analog signals suitable for 
display on a monitor, possibly after interpretation by 
lookup tables or other devices whose function is not 
important to the present discussion. 

Our model for a frame buffer is patterned after 
the Adage/Ikonas RDS 3000, the hardware on which 
we have implemented these algorithms. Most of the 
ideas presented here apply to other frame buffer 
architectures, although we do assume that the video 
chain is similar to the control, crossbar, and lookup 
table modules available in the Ikonas [1,9]]. Not all of 
these features are required for most of the z-buffer 
techniques, although the final algorithm presented here 
actually assumes a slightly enhanced crossbar switch 
over what is supplied by Adage. Our goal is not to 
restrict attention to a particular frame buffer 
architecture, but to point toward general hardware 
features that will substantially enhance the performance 
of z-buffer algorithms at modest cost. 

Double-buffering is accomplished in a frame 
buffer by the refresh processor reading from I 0 during 
odd update cycles and from I x during even update 
cycles, thus allowing the update processor to use I x and 
Z for scan conversion during odd update cycles and I 0 
and Z during even update cycles. Only one z-depth 
field is necessary because once the image has been 
rendered, its z-depths are no longer needed. 

The selection of specific fields for reads and writes 
by both the update processor and the refresh processor 
may be accomplished by masking and shifting (either in 
hardware or through a combination of hardware and 
software) or by using address offset registers when the 
various fields are stored in different areas of frame 
buffer memory. These details are not important for the 
discussion, so we will assume that the frame buffer 
maintains all of the fields associated with a pixel within 
a single “word” and that both processors are capable of 
selecting particular fields with no penalty in time. It is 
convenient to assume that this is accomplished by mask 
registers and shift registers , associated with each 
processor, that perform selective load and store 
operations to only those bits indicated by the mask, 
leaving the other bits of a pixel unchanged. 

Any number of bits may be associated with the 
two intensity buffers. Common configurations use 8 
bits (with color lookup tables to achieve a full color 
space) or 24 bits (8 bits each of red, green and blue). 
The z-depth must be able to discriminate objects within 
the scene, so between 8 and 32 bits are commonly 
assumed. Sutherland and Hodgman discuss a scaling 
strategy for making the best use of the precision 
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available [11]). This totals to from 24 to 80 bits per 
pixel, depending upon the amount of color and z-depth 
desired. The memory requirement can be reduced 
somewhat if only the intensity buffer currently being 
displayed is located within the frame buffer itself (the 
other fields being stored on the host) but our technique 
is designed for high-performance systems in which all of 
the memory resides within the frame buffer to achieve 
the necessary update and refresh rates. Only the two 
intensity buffers are accessed by the refresh processor in 
any of these schemes, so some savings in cost could be 
achieved by making the z-depth memory single-ported, 
but this might preclude using the memory for other 
purposes and is thus a less general architecture. 

We will label each update cycle by an integer, but 
often will only use its even or odd parity (the low-order 
bit). Thus many places in our algorithms where k is 
manipulated as an integer modulo some base (usually 
two) the manipulation can be implemented with simple 
bit operations such as complementation, rather than 
with the more expensive increment and modulus 
operations. 

The next three sections present increasingly 
sophisticated versions of the z-buffer algorithm. The 
first is the standard implementation, suitable for a very 
basic frame buffer. The subsequent versions achieve 
increased performance by partitioning the calculation 
differently among the update and refresh processor and 
by using modest hardware assistance to synchronize the 
calculations. 

SOLUTION #1 

TRADING OFF MEMORY FOR PROCESSING TIME 

The basic appeal of the z-buffer algorithm is that 
it affords a complete solution to the visible surface 
algorithm for little more than the time required to 
perform simple scan conversion without the visible 
surface calculation. The algorithm is usually 
implemented entirely in the update processor, with the 
refresh processing serving only to display the resulting 
image on the monitor. The two procedures Update#l 
and Refresh# 1 shown in Figure 1 express the 
interlocked cycles in a standard implementation of the 
z-buffer algorithm. Normally the refresh cycle would 
be implemented entirely in hardware, but we describe it 
here as if it were implemented in software to provide a 
uniform presentation of the two processes. The 
processes are loosely synchronized in this basic version 
of the z-buffer algorithm. At the start of each update 
cycle the shared variable k is incremented and this 
causes the refresh processor to swap image buffers with 
the update processor. Actual implementations would 
usually include a provision to further synchronize the 
double-buffering so that buffers are swapped only at the 
end of a complete frame or field time. 

This is the standard z-buffer algorithm presented 
in text books and is easily implemented on most frame 
buffers. The main procedure invokes (once) the setup 
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PROCEDURE Initialize#l; 
k := 0; 

FOR y := maxY DOWNTO 0 DO 
FOR x := 0 TO maxX DO 
Io£ x 'yJ := background; 

OD; 

END Initialize#!; 

PROCEDURE Update#l; 

WHILE true DO 
k := k+l; 

FOR y := maxY DOWNTO 0 DO 
FOR x := 0 TO maxX DO 

Ik mod := background; 

Z[x,y] := od; 

OD; 

FOR every object in the scene DO 

FOR every pixel (x,y) in the object DO 
IF (object[x,y].z < Z[x,y]) THEN 
I* mod 2 Cx.y] •= object[x,y].color; 
Z[x,y] := object[x,y] .z; 

FI; 

OD; 

OD; 

{ optional wait for next frame > 

OD; 

END Update#l; 

PROCEDURE Refresh#l; 

WHILE true DO 

FOR y := maxY DOWNTO O DO 
FOR x := 0 TO maxX DO 
display I k _! mod 2 C X *T]; 

OD; 

OD; 

OD; 

END Refresh#l; 

Figure 1. The Standard Z-Buffer Algorithm 


in Inltialize#l and then invokes (in parallel) the 
two procedures Update#l and Refresh#l which 
never terminate, but cycle continuously as the double¬ 
buffering scheme alternately updates the two image 
buffers. The refresh processor is merely cycling 
through memory performing the standard frame buffer 
readout to the video hardware. If the algorithm is 
being used to generate a single frame, there is really 
nothing to discourage its use. But if the algorithm is 
being used to generate a sequence of frames (as 
assumed here) the algorithm does not fully utilize the 
available hardware. 

In this situation there is a bottleneck that may 
potentially degrade performance. The z-buffer 
algorithm requires that the intensity and z-depth fields 
be reset to their initial values (background color and 
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infinity) before each update cycle. The procedure 
Update#l performs this initialization explicitly at the 
beginning of each update cycle. This can be time- 
consuming for two reasons. The first is that the update 
processor can only perform initialization when it is not 
performing scan conversion and thus the full bandwidth 
of the update processor is not available for scan 
conversion, an unfortunate consequence because real 
images frequently require substantially more update 
time than refresh time. 

This is related to the second reason, which is that 
even update processors with special purpose hardware 
may not be able to write all of the pixels within the 
frame buffer in one refresh cycle. The refresh 
processor reads multiple pixels during a single memory 
cycle because it looks at pixels in scan-line order and 
thus can access multiple memory chips in parallel. This 
allows it to achieve a complete refresh within one frame 
time. The update processor typically does not do this 
because it is designed for random access to the frame 
buffer. The result is that one or more refresh cycles 
may be “wasted” between successive update cycles 
while the update processor is busy initializing instead of 
rendering. This degrades the real-time or quasi-real- 
time performance of the system by a significant 
percentage. 

Because it accesses multiple pixels during a single 
memory cycle, the refresh processor is capable of 
performing the initialization in a single refresh cycle. 
Some frame buffers support this by allowing the refresh 
processor to change values in selected fields of pixel 
memory as each pixel is written back into memory after 
being read during the refresh cycle [1J. This allows the 
refresh processor to initialize the second image buffer 
and the z-depth buffer in a single refresh cycle. 

Unfortunately, unless initialization is synchronized 
with image generation there is little advantage to this 
a PP roac h because the update processor must wait for at 
least one complete refresh cycle after update cycle k to 
insure that the refresh processor has completely reset 
both the I k+1 and Z fields before it can begin update 
cycle k+1. This means that the update processor will 
be idle a significant amount of the time (recall that the 
slowest update rate for quasi-real-time is 1/5 second so 
that even with a refresh cycle of 1/60 second the image 
processor would be idle more than 8% of the time — in 
the worst case the processor would waste 50% of its 
bandwidth while maintaining a 1/30 second refresh 
cycle and a 1/15 second update cycle). The percentage 
of idle time for the update processor is an important 
consideration because it determines an upper bound on 
the complexity of the image that can be rendered. This 
problem can be overcome by the addition of extra 
hardware, in this case a substantial increase in frame 
buffer memory. 

SOLUTION #2 
TRADING MORE MEMORY 
FOR PROCESSING TIME 
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PROCEDURE Initialize#2; 
k := 0; 

FOR y := maxY D0WNT0 O DO 
FOR x := 0 TO maxX DO 
lo tx,y] := background; 

Z 0 [x,y] := ©; 

0D; 

END Initialize#2; 

PROCEDURE Update#2; 

WHILE true DO 
k := k+1; 

FOR every object in the scene DO 

FOR every pixel (x.y) in the object DO 
IF (object[x,y] .x < Z k mod 2 [x.y]) THEN 
I k mod 3C x »y] := object[x,y] .color; 
z k mod 2 Cx.y] : = object[x,y] .z; 

FI; 

0D; 

0D; 

{ mandatory wait for next frame > 

0D; 

END Update#2; 

PROCEDURE Refresh#2; 

WHILE true DO 

FOR y := maxY DOWNTO 0 DO 
FOR x := 0 TO maxX DO 
display I k _ x nod 3 [x,y]; 

mod 3 £*,y] : = background; 
z k+i mod 2t*,y] '= infinity; 

0D; 

0D; 

0D; 

END Refresh#2; 

Figure 2. The Triple-Buffered Z-Buffer Algorithm 


To achieve real-time or quasi-realtime 
performance a third image field I 2 can be added to the 
frame buffer along with a second z-depth field Z x (the 
original depth field becomes Z 0 ). In this case the 
update processor cycles between three image memories, 
with the refresh processor displaying from 1^, the 
update processor writing into I k , and the refresh 
processor initializing I k+1 (all subscripts for I are now 
modulo 3 instead of modulo 2). Similarly, the update 
processor uses Z k for its visible surface calculation while 
the refresh processor is initializing Z k+1 (these subscripts 
are modulo 2 since only two depth buffers are 
required). 

As long as each update cycle requires at least one 
entire refresh cycle (a modest assumption since a faster 
update rate would imply that the image was being 
updated faster than it was being viewed on the monitor) 
the refresh processor will be able to initialize a new 
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image and depth buffer in time for each update cycle, 
thus freeing the update processor from any overhead for 
initialization. Procedures Update#2 and Refresh#2 
to accomplish this are straightforward modifications to 
Update#l and Refresh#l. 

The clear drawback to this scheme is the massive 
increase in frame buffer memory. The requirement for 
image memory has increased by 50% (from two buffers 
to three) and the requirement for depth memory has 
increased by 100% (from one buffer to two). For the 
case of a full 24-bit image buffer and a 32-bit depth 
buffer, this is a total of 136 bits per pixel, an increase 
of 70%. While this might be appropriate for the 
increased performance, we are at best getting an 
improvement that is of the same order of magnitude as 
the increase in memory cost. We can do much better. 

SOLUTION #3 

TRADING OFF HARDWARE COMPLEXITY 
FOR LESS MEMORY 

An alternative is to use one additional bit in the 
frame buffer as a cycle counter (a dirty bit) to achieve 
complete overlap of image generation and initialization 
while avoiding the necessity of adding an additional set 
of image and depth buffers. As for Solution #2, this 
will in fact achieve an update rate that is equal to the 
refresh rate for simple scenes (something not achievable 
with the standard software z-buffer algorithm) but at 
far less hardware cost. We assume that the frame 
buffer has been extended to include a one-bit field D 
containing the parity (low-order bit) of k, the update 
cycle counter. 

The frame buffer is initialized once, before the 
actual z-buffer algorithm begins, so that I 0 is the 
“background” color and D is 0, the parity of the first 
image. The setting of II and Z are arbitrary. The 
update processor begins image generation cycle 1 with 
the refresh processor initializing both I 1 and Z. When 
the z-buffer algorithm has generated its first image, the 
refresh processor begins displaying from I 1 and 
initializing both I 0 and Z, but it performs the 
initialization selectively using the cycle number and 
information kept in the D field of each pixel. The 
system assumes a steady-state operation in which the 
two processors synchronize their activity after each 
update cycle through the shared cycle number and the 
D field. 

This is accomplished in the following way. The 
update processor proceeds as it normally would, 
assuming that both I k and Z have been initialized 
previously by the refresh processor, even though the 
refresh processor may not have visited some (or all) of 
the pixels. The key idea is that each time the update 
processor performs a depth comparison (testing the 
z-depth of an object against the value stored in the Z 
field of a particular pixel) it biases the comparison in 
favor of the new z-depth (that of the object) if the D 
field does not match the parity of the update cycle 


PROCEDURE Initialize#3; 
k := 0; 

FOR y := maxY D0WNT0 0 DO 
FOR x := 0 TO maxX DO 
Io£x,y] := background; 

D [x, y ] := O; 

0D; 

END Initialize#3; 

PROCEDURE Update#3; 

WHILE true DO 
k := k+1; 

FOR every object in the scene DO 

FOR every pixel (x,y) in the object DO 
IF (D[x,y] = k-1 mod 2) 

OR (object[x,y] .x < Z[x,y]) THEN 
I* mod 2 : = object[x,y].color; 
Z[x,y] := object[x.y].z; 

D[x,y] := k mod 2; 

FI; 

0D; 

0D; 

{ mandatory wait for next frame > 

0D; 

END Update#3; 

PROCEDURE Refresh#3; 

WHILE true DO 

FOR y := maxY D0WNT0 O DO 
FOR x := 0 TO maxX DO 
displaj I*.! ffiod 2 [x.y]; 

IF D[x,y] = k-1 mod 2 THEN 
Ir mod 2 £ x »y3 := background; 

Z[x,y] := infinity; 

D[x,y] := k mod 2; 

FI; 

0D; 

0D; 

0D; 

END Refresh#3; 

Figure 3. The Z-Buffer Algorithm Using A Dirty Bit 


number. This in effect allows the update processor, by 
checking the D field, to detect those pixels for which the 
refresh processor has yet to perform the appropriate 
initialization and to substitute the value infinity for 
whatever (incorrect) z-depth appears in the frame 
buffer. The update processor always re-writes the D 
field with the parity of the current update cycle number 
each time it stores into the frame buffer to avoid the 
problem of the refresh processor mistakenly initializing 
pixels that have already been used for the current 
update cycle. 
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The refresh processor must modify its operation so 
that it checks the D field before initializing a pixel. 
Were this not the case, it might overwrite a pixel that 
the update processor had already computed, since the 
initialization and update take place simultaneously. 
The refresh processor only performs an initialization 
operation if the D field is not the same parity as the 
current update cycle (put another way, it only initializes 
those pixels whose D fields equal k-1). Pixels being 
initialized have their D fields set to k (not for the 
benefit of the update processor, but so the refresh 
processor knows to re-initialize them during the update 
cycle k+l). 

For this scheme to work two assumptions must 
hold. The first is that the refresh processor must be 
allowed to complete at least one complete cycle between 
update cycles. As we have already noted, this is a 
reasonable assumption and is easily guaranteed by a 
simple test performed at the start of each frame. The 
second assumption is that the refresh processor makes 
its access to memory in a single atomic operation 
(“read-modify-write”). If this were not the case, the 
refresh processor might overwrite a pixel whose D field 
changed between the time that it was read and the time 
that it was written back to memory. The implication of 
this assumption is that the refresh processor must have 
a reasonably sophisticated interface to frame buffer 
memory — it is fetching multiple pixels in parallel, all 
of which must have their D fields checked and their I 
and Z fields modified in a single memory cycle. 

The procedures Update#3 and Refresh#3 
indicate the z-buffer algorithm using the D field to 
overlap initialization by the refresh processor with 
image generation by the update processor. The 
algorithm as stated assumes that the update processor 
only begins a new cycle during the start of a new 
frame. This can be weakened significantly to the 
requirement that the update processor not begin a new 
cycle until the refresh processor has performed at least 
one complete refresh cycle since the last update cycle 
began (our standard assumption). It may also be 
desirable to insist that the refresh processor not change 
its value of k except at the beginning of a frame (or at 
least a field) because of disturbing video effects. 

Before continuing, the reader may want to verify 
that the algorithm works as stated, with no race 
conditions existing that depend on the order in which 
objects are scan converted by the update processor or 
the order in which pixels are initialized by the refresh 
processor. In doing so, special note should be made of 
the assumption that the refresh processor’s memory 
accesses are atomic. 

The cost in additional memory for this scheme is 
minimal. Only one extra bit is needed at each pixel. 
The increased sophistication in the refresh processor, 
however, is more substantial and may push up the cost 
of the video hardware significantly. Instead of 
performing a simple read-modify-write cycle (as it 
would for Solution #2) in which the new values written 
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PROCEDURE Initialize#4; 
k := 0; 

FOR y := maxY DOWNTO 0 DO 
FOR x := 0 TO maxX DO 
D 0 := false; 

:= false; 

OD; 

END Initiallze#4; 

PROCEDURE Update#4; 

WHILE true DO 
k := k+l; 

FOR every object In the scene DO 

FOR every pixel (x,y) in the object DO 
IF (NOT D k mod 3 [x,y] ) 

OR (object[x,y] .x < Z[x,y]) THEN 
Ik mod := object[x,y].color; 

Z[x,y] := object[x,y] .z; 

D k mod := true; 

FI; 

OD; 

OD; 

{ mandatory wait for next frame > 

OD; 

END Update#4; 

PROCEDURE Refresh#4; 

WHILE true DO 

FOR y := maxY DOWNTO 0 DO 
FOR x := 0 TO maxX DO 
IF Dk-l mod THEN 

displa 7 I k _i nod 2 [x,y] 

ELSE 

display background; 

Dk+i mod • = false; 

FI; 

OD; 

OD; 

OD; 

END Refresh#4; 

Figure 4. The Z-Buffer Algorithm Using 3 Dirty Bits 


back to memory are independent of those read from 
memory (at least for the fields that change) the refresh 
processor must now check the status of the D field (a 
single bit) to determine whether the original contents 
are to be left in the I k mod 2 and Z fields or if they are 
to receive initialization values. All of this must be 
performed in parallel for anywhere from 16 to 64 
pixels, depending upon the design of the frame buffer 
memory interface. We can avoid this necessity, while 
still retaining the performance, by adding a few more 
bits to each pixel. 
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SOLUTION #4 

TRADING BACK SOME OF THE MEMORY 
FOR HARDWARE SIMPLICITY 

The reason the refresh processor’s memory 
controller must be so complicated is that it must decide 
(very rapidly) which pixels must have their I and Z 
values modified. This dependence of certain bits within 
the pixel on other bits within the pixel may be difficult 
to determine, especially if the large pixel size requires 
some of the bits to reside on different boards. The 
solution proposed is to eliminate the necessity of 
checking the current pixel contents before deciding 
what to write back into memory during the refresh 
cycle. What we want is an oblivious memory controller, 
one which always writes the same pattern (or at least 
one which always changes the same bits to the same 
values) independent of the current pixel contents. 

The trick is to use three dirty bits, one for each of 
the cycles k-l, k, and k+1. These can then be 
administered independently by the refresh controller. If 
we interpret D k as a Boolean value (true or false) 
that tells whether the current pixel contents have been 
set during the corresponding update cycle, the job of 
the refresh processor becomes much simpler. During 
steady state, the refresh processor will be fetching pixels 
from I k _! and initializing D k+1 while the update 
processor is modifying D k and Z. 

The only catch to this scheme is that pixels that 
are never rendered by the update processor (because 
they correspond to background) will remain 
uninitialized in their I and Z fields. This is not a 
problem if the refresh processor interprets the D field 
before passing pixel values on to the rest of the video 
chain. It must simply check D k _ x and if it is false 
(meaning that this pixel was never set during the 
previous update cycle) then it should pass on 
background color rather than what is stored in 1^. 
Hardware to perform this task is much simpler than the 
massive parallel checking required for Solution #2 
because it can be performed on a pixel-by-pixel basis 
using techniques similar to lookup tables. 

It is interesting to note that three dirty bits are 
necessary to implement this scheme. Two are not 
enough. D k _ x must remain untouched during update 
cycle k or else the refresh processor will become 
confused as to what is or is not background. D k 
obviously must be initialized before update cycle k 
begins and cannot be changed except by the update 
processor. Neither field is free for initialization during 
update cycle k. Thus a third dirty bit D k+1 is required. 

CURRENT IMPLEMENTATIONS 

These algorithms are implemented on the 
Adage/Ikonas RDS 3000. Solution #1 is the standard 
z-buffer algorithm. The only change in the 
implementation from what has been presented is that 


the Z field is typically stored in off-screen pixel memory 
because the frame buffer has only 32 bits per pixel and 
24 are used for intensity. On frame buffers with no 
off-screen memory the entire algorithm can be 
accommodated on-screen by decreasing the number of 
bits allocated to intensity and setting the lookup tables 
appropriately to ignore bits in the Z field as they are 
read out during display. 

Solution #2 (the triple-buffered version) has also 
been implemented on the Ikonas, but with only limited 
intensity and z-depth information due to the 
requirement that all of the fields reside in on-screen 
memory. This stems from the fact that the auto-clear 
feature of the Ikonas (which writes zeros into memory 
during the refresh cycle, subject to a write mask that 
determines the bits in a pixel to be cleared) only 
processes visible pixels. Adopting conventions for 
intensity and z-depth that encode the background color 
and the maximum z-depth as zero allows existing 
hardware to handle the initialization in the refresh 
processor. A more natural encoding is possible if the 
auto-clear feature uses the shading registers (available 
on with the Ikonas GM memory boards) to set the 
values to be written into memory, rather than always 
writing zeros into the fields specified by the write mask 
registers. Buffer swapping is accomplished by 
manipulating the crossbar switch and the lookup tables. 


Solution #3 (the dirty bit) is not directly 
implementable on the Ikonas because the auto-clear 
feature has no way of conditionally modifying bit fields. 
This is symptomatic of the objection raised earlier that 
this approach assumes more intelligence in the refresh 
processor’s memory controller than is likely to exist in a 
frame buffer. 

Solution #4 (three dirty bits) is easily 
implemented on the Ikonas using the convention that 
true is a 0 bit and false is a 1 bit. The auto-clear 
feature is used to initialize the dirty bits during refresh 
and a combination of the crossbar switch, the lookup 
tables, and the overlay option is used to modify the 
pixel readout to background color for pixels whose 
values have not been set by the update cycle. (The 
overlay option on the Ikonas allows certain bits — the 
appropriate dirty bit in our case — to select an alternate 
lookup table if the bits are non-zero. By setting all of 
the entries in the alternate lookup table to be the 
background color the correct modification is performed 
as pixels are read from memory during refresh.) 

FURTHER CONSIDERATIONS 

There is a question as to how dynamic the 
allocation of the various fields should be within a pixel. 
On the update processor all of the field selection can be 
fairly easily accomplished using masking and shifting. 
If the intensity and z-depth fields are multiples of eight 
bits, simple byte swapping logic can be used to present 
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an interface to the processor that is independent of the 
exact field being selected (i.e.^ the update processor will 
always present intensity of z-depth values as right- 
adjusted 32-bit values that will be automatically stored 
in the correct fields of a pixel). The dirty bits may 
have to be handled as special cases, although choosing 
appropriate conventions such as forcing all z-depths to 
be positive (and thus saving the sign bit for the dirty 
bit) could be employed. Generalized crossbar switches, 
similar to that on the Adage/Ikonas RDS 3000, used by 
both the update and refresh processors would clearly 
solve any efficiency issues related to problem. 

In implementing such hardware, some caveats are 
in order. Any registers that allow values or fields to be 
selected should be both writable and readable. If 

necessary, there should be shadow registers so the 
values can be read back, although it is preferable that 
the registers themselves be readable directly. 

It is worth noting that only the refresh circuitry 
needs a read-modify-write cycle that is atomic for the 
schemes to work. This is an important consideration, 
especially if the update processor is composed of 

distributed or pipelined tilers that separate their read 
accesses from their write accesses by multiple cycles. 
Such architectures are particularly useful for z-buffer 
algorithms because they increase parallelism and thus 
the update rate and yet require little or no 

synchronization among the multiple update processors 
because the z-buffer algorithm is itself inherently 
distributed. Our algorithms will not suffer in this case. 
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Abstract 

Throughout the computer graphics literature, but in 
modeling and animation lore in particular, the prevalent 
attitude seems to be that scripting and interaction are ir¬ 
reconcilably different types of user interface. This 
paper propounds the belief that this dichotomy is a myth 
and gives examples of systems which encourage this 
belief. Two of the systems described are a keyframe 
animation program and a geometric modeling program 
developed at the NYIT Computer Graphics Laboratory. 
Both of these systems are used on a daily basis for 
production, development and research. 

1. Introduction 

Scripting and interaction are widely used interface 
styles for modeling and animation programs. The ad¬ 
vantages and disadvantages of each are well known. 
Scripting systems allow command sequences to be in¬ 
crementally modified and reexecuted, but allow no inter¬ 
active input of data and operations. Interactive systems 
allow this input, but lose all record of commands 
previously executed, leaving the user to start over if some 
intermediate parameter affecting the final result needs to 
be changed. In almost every publication describing a new 
modeling or animation system, there is a paragraph or 
two which notes these properties and proceeds to ballyhoo 
the advantages of the method chosen. As with many 
other such apparent dichotomies, most designers accept 
without question the division, pick one technique, imple¬ 
ment it and resign themselves to writing the obligatory 
text. From this lamentable situation we may conclude 
that the production of a user interface which integrates 
scripting and interaction is difficult , but certainly not 
that it is impossible. 

The integration of scripting and interaction is 
worthwhile for several reasons. The first is that a more 
general conceptualization of user interfaces results. This 
has the potential for expanding the set of problems that 
can be solved with a computer. On a more practical note, 
interactive graphics workstations are often expensive and 
therefore scarce. In many graphics houses, the time avail¬ 
able on such resources is a limiting factor in the amount 
and/or complexity of animation producible. It helps 
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greatly if users can work, albeit more slowly, at ordinary 
terminals, using scripting instead of interaction. 

Resource limits are not the only reason to use scripting 
at one workstation and interaction at another. The 
production of computer animation requires many diverse 
activities, most of which are accomplished more efficiently 
with one technique than the other. Some establishments 
may boast interactive and scripting tools for particular 
activities, but rarely do these tools interface to a common 
database format. More often, the systems are incom¬ 
patible “competitors”, perhaps having been written by 
different individuals, and any attempt to use both to solve 
a single problem is frustrated by the need to reorganize 
data, shuffle files about and translate between formats. A 
system which integrates scripting and interaction can be 
used for conceptually diverse activities in whatever mode 
is appropriate. A banner example of the type of produc¬ 
tion which needs this is the now famous robot ant anima¬ 
tion by Lundin [15], in which the basic 3d path of the 
model is supplied interactively and the dynamics are sup¬ 
plied by an explicit computational model. 

For perspective, the more general problem here is the 
development of editing theory. A general theory of edit¬ 
ing should be independent of the data to be edited, so it 
should be applicable not only to ordinary text editing, but 
to other activities such as geometric modeling (shape 
editing), animation and robot programming (motion 
editing), music production (sound editing), painting 
(image editing) and computer programming (algorithm 
editing). What we would like for any given data format is 
a logical and complete set of functions for manipulating 
that format, and both interactive and scripting interfaces 
to these functions. 

When differentiating between interaction and script¬ 
ing, it is natural at some point to quibble about exactly 
what interactive means. Some would argue that if the 
time around the typical edit-process-view loop of a script¬ 
ing system is short enough, the system is interactive. Is 
there some threshold on this loop time which must be 
met? “Real” time? Historically, the term has been used 
to mark a contrast with batch systems, where the sizes of 
input and output data sets are usually large. Systems 
that produce visible results after small amounts of input 
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(after every keystroke, say, in a screen editor) we classify 
as being definitely interactive. An often cited parameter 
for language-based systems is whether the system is in¬ 
terpretive or compiling. Interpretive systems, however, 
are rarely used without large application dependent 
scripts, while compilation systems can call themselves 
“interactive” simply by connecting their inputs to the 
keyboard. Some systems [13, 16] blur the distinction even 
further by loading compiled code into interpretive front 
ends at run time. The answer to all this, of course, is that 
the set of interactive systems is a fuzzy one, with no clear 
dividing lines along any of these axes. Hence, we discon¬ 
tinue our quibbling here. 

2. Combining Scripting with Interaction 

This section describes the requirements for an in¬ 
tegrated scripting/interactive user interface. Several 
programs/systems which implement these requirements in 
varying degrees of completeness are drawn upon for ex¬ 
amples. Emaca is an “extensible, customizable, self- 
documenting” screen editor developed at MIT [1]. The 
Unix 1 implementation with which the author is familiar 

[2] supports extension through an interpreter for a lisp¬ 
like language. Troff [5] (or Scribe or TeX 1 if you prefer) 
is a typesetting program which supports macros. The 
Macintosh personal computer [17], with its attendant 
firmware, is the latest paragon of user interface style. 
Em is an animation program written at NYIT for inter¬ 
active motion specification (animation) of parameterized 
models [8]. It is unique among these example systems in 
its use of a large (rather much larger than that of C [4]) 
formal grammar [7] for defining its basic input language. 
Gem [16] is an interactive geometric modeling program, 
also written at NYIT, and is the current object of this 
research. 

To develop a model of a user interface which in¬ 
tegrates scripting and interaction, one can think either 
about adding an interactive front end to a scripting sys¬ 
tem, or about adding scripting features to an interactive 
program kernel. Both these approaches have merit. The 
former is more likely to result in a consistent, well- 
designed system, since the basis for the system is presum¬ 
ably a well-designed language. The latter is more likely 
to result in a truly flexible system, since the user interface 
design would proceed unfettered by syntax. Gem was 
developed in the latter mode, and this paper attempts to 
relate the experience gained from that activity. What, 
then, is required of an interactive system which also sup¬ 
ports scripting? 

(1) The first and most obvious requirement is that 
there must be some appropriate script representation. 
Text is an obvious choice, but the problem with text is 
that it forces linear formatting. This may be acceptable 
for document and music production, but for modeling and 
animation, where instancing is a way of life, a graphic 
dataflow format may be more appropriate. (In fact, the 
frustration of ordinary music notation is due in part to its 
limited support for instancing.) Even if a linear format is 
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tolerable, a more cogent objection to text is that it 
represents the script at the wrong level. Dealing with the 
script directly as text ignores the higher-level command 
structure. 

There must be a script representation of every inter¬ 
active editing command, and there must be interactive ac¬ 
cess to every script command. This brings up the matter 
of what is and what isn’t script. It is important to dif¬ 
ferentiate commands to an interactive front end from 
commands that manipulate the data to be edited. The in¬ 
teractive interface provides an additional , rather than an 
equivalent means of access to editing commands. This 
seems a sensible separation, as no one wants to type in 
reams of text for low level events like tablet movements 
anyway. In fact, it is useful to be able to siphon off the 
raw event stream into a file for testing and demonstration 
purposes (as the Macintosh can do), but there is no need 
for that file format to be integrated with the script 
representation. 

(2) The system must maintain some internal represen¬ 
tation of the script. Unix Emacs uses three represen¬ 
tations for script material, one the lisp-like syntax men¬ 
tioned earlier for defining functions, the second a buffer of 
raw keystrokes for defining “macros” and the third a 
private internal format for supporting ‘undo’. Em begins 
by parsing a script with a yacc-generated parser [6], 
generating a dependency tree and display segments as 
semantic results. Instead of converting the script to a 
syntax tree representation, however (as a compiler would 
for code generation), the original text is stored in memory 
for reparsing during the interactive updating of parameter 
values. 

(3) The system must provide a way to execute a script. 
This is the easy part, as pushing the current input source 
on a stack and reading a script from a file or memory is 
simple on reasonable operating systems. The only added 
wrinkle is that scripts should be able to turn to an ab¬ 
solute source (typically the user) for input. In modeling, 
for instance, a user may want to create a script segment 
that creates an instance of a highly parameterized primi¬ 
tive by filling in some parameters on its own and prompt¬ 
ing for the rest. Emacs provides an assortment of func¬ 
tions for getting values of various types from the 
keyboard. 

(4) The system must be able to create a script from in¬ 
teractive input. Emacs saves raw keystrokes for macro 
definition; function definition from interactive input is 
achieved by mapping keystrokes through a keymap which 
associates functions with keys, and then appending the 
names of the functions to the script. The Synclavier [21] 
provides a facility for “reverse-compiling” a real-time 
keyboard performance into a score. 

Creating scripts from interactive input is where the 
real subtleties begin to arise. For example, most scripted 
operations will be used as subroutines would be in an or¬ 
dinary programming language. (This restriction is reason¬ 
able, since in many languages everything is part of some 
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subroutine.) The context in which this “subroutine” is to 
be executed will dictate end conditions for the recording 
of the script. One solution, used by Emacs for both func¬ 
tion and macro definition, is to provide two special inter¬ 
active commands to start and stop the translation of in¬ 
teractive input into script. These commands must be 
treated specially in all contexts, as their meaning in a 
script is unclear at best, and they should not be translated 
into script when given interactively. 

Some decision must be made as to whether the input 
being translated is to be executed at the time of record¬ 
ing. A good solution to this is to execute the input when 
it’s coming in interactively, and to simply store it away 
when it’s not. Emacs, being “interactive”, executes its in¬ 
put while translating it, which makes clear the effects the 
script will have when executed later. At the other end of 
the spectrum, Troff defines very clearly a copy mode 
through which input destined for macros is read. 

Above, it was maintained that a direct translation of 
the low level event stream is not an appropriate script 
representation. The open question is this: if raw events 
aren’t going to be represented directly in the script, how 
are they going to be represented? High bandwidth events 
such as tablet and dial movements present a formidable 
data pollution problem, so some means of compressing 
these events into a compact interpretation needs to be 
found. 

(5) The system must provide a way to edit a script. 
This is important, since very little time is spent creating 
scripts as opposed to editing them (witness software 
production). For this function, Emacs “cheats” and uses 
itself to do the editing. It is, after all, a text editor, and 
its only events are characters from the keyboard. Inter¬ 
active modeling and animation programs which employ an 
arsenal of interactive devices are not so fortunate. More 
special commands are needed here to position the 
“cursor” where translated script will be inserted, and to 
delete script elements. Em uses a compromise solution to 
allow the user to edit the set of current input modes 
(relationships between logical device values and parameter 
values), which is to edit a text representation of the input 
modes with a text editor and then read them back in. 
The text representation is actually a subset of the com¬ 
mand language, and the same parser is used for it that is 
used for the input script. 

The support of editing is the most difficult part of in¬ 
tegrating scripting and interaction. As an example, con¬ 
sider the support necessary for the special case of ‘undo’. 
At any point in the command sequence, the user can issue 
an undo command which undoes the previous editing 
command, and sequences of undo commands have 
cumulative effect. (An interesting side issue is whether 
the undo commands should be part of the script, that is, 
whether you should be able to undo the undoing.) There 
seem to be only two alternatives for supporting this 
functionality: either all the operators must be invertible, 
or the entire state of the system must be snapshotted 
after every command, both fairly daunting propositions. 
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In the more general case, commands at any point in the 
script may be changed. Assuming that an interactive sys¬ 
tem wants to keep up to date versions of the final results 
on display, this means that to minimize script reexecution 
time, every intermediate result must be saved. This 
generates an obvious conflict with storage requirements. 
Faced with this, one gives up and accepts the difference 
between compiled and uncompiled forms of the data, as 
well as script reevaluation time. 

3. An Implementation 

This methodology is being explicitly applied in the 
design of the user interface to Gem, to our geometric 
modeling program at NYIT. This program sports the 
usual socially acceptable features such as on-screen menu- 
ing, overlapping windows (that support vector data), icons 
and on-line help, as well as some more unique features 
such as run-time loading of compiled object code and a 
rich symbol table structure. Modeling operations include 
polygon digitizing, translational and rotational sweeping, 
mirroring, stellation, truncation, boolean set operations, 
offsetting and quadratic, bicubic and geodesic subdivision 
[9, 10]. Output databases are produced for our animation 
and rendering software. 

The modeler is extensible at two different levels. The 
scripting facility described below provides a means for or¬ 
dinary users to write new functions in the command lan¬ 
guage of the modeler. The menu structure is dynamically 
modifiable, so these functions can be inserted at ap¬ 
propriate places or just executed from the keyboard. The 
run-time loader and symbol table structure are used by 
developers and application programmers to extend the 
program in the base programming language. After a run¬ 
time package has been loaded in, the program appears for 
all intents and purposes as it would have if the package 
had been linked in by the system maintainers before 
releasing it. 

Scripts are represented both externally and internally 
as text. Execution is simple, given the i/o redirection 
facilities of Unix [3]. The problem of turning to the user 
for direct input has been solved by tagging the data 
rather than calling a special routine. (In fact there is no 
explicit syntax in the input language for procedure calls.) 
When interactive input is desired in a script, a special 
token is inserted which is understood by all the data col¬ 
lecting routines to mean: “ignore this token, push the cur¬ 
rent input channel on the stack and use the interactive in¬ 
put channel.” (This is similar to the method used by the 
Unix command interpreter [11], which, alas, also does not 
support subroutines.) 

Script generation is accomplished by sending strings to 
a script from a small number of places in the program. 
The names of functions are emitted just prior to invoca¬ 
tion. When functions need arguments (integers, strings, 
parts, etc.) they call ‘get’ functions which prompt for and 
return a variable of the appropriate type. These func¬ 
tions send the argument values obtained to the scjipt just 
before returning. This scheme limits the number of places 
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in the program that have to know about script generation 
to one per data type. 

Because of Gem’s lack of a formal grammar to define 
the input language, a practical problem in the creation of 
scripts is the difficulty of determining where a macro 
ends. When the define-macro command is given, the 
translation of events into script is enabled. When this 
command occurs in a script, commands are not executed 
during recording, but consumed by the define-macro com¬ 
mand until it sees the end-macro command. In this mode, 
Gem can be confused if the end-macro command appears 
inside the macro. When the input is interactive, com¬ 
mands are executed during recording. This method does 
not become confused since the commands consume their 
arguments. 

One of Gem’s user interface features which compli¬ 
cates script generation is its use of cancelable commands. 
Each of the get functions provides a ‘cancel’ option which 
causes the calling function to abort. To handle this cor¬ 
rectly, the generation of script must be deferred until 
commands execute to completion. Presently, Gem makes 
no attempt to handle this problem. 

Another interesting issue in script generation is how to 
handle modelessness. Gem makes an attempt to be as 
modeless as possible. In the get functions, any input 
which isn’t of the type needed is passed through to a 
recursive invocation of the command interpreter. For ex¬ 
ample, while selecting a part, the user can move the 
camera, make new parts, change the windows around, ac¬ 
cess the on-line help database and so on. This is one case 
in which the lack of a formal grammar actually makes the 
script generation easier: Gem simply prints script com¬ 
mands as they are evaluated. For example: 

intersect-parts parti move-camera part2 

In contrast, if the script format were Lisp, for example, 
some means would have to be found of splicing in the ex¬ 
tra commands: 

(intersect-parts parti (progn (move-camera) part2)) 

The conversion of tablet and dial events to script is 
dependent on context. For example, in several contexts 
tablet picks are used to select a current element (solid, 
surface, polygon, edge, vertex, etc.). In these contexts, 
the tablet pick is considered to be an alias for the ap¬ 
propriate ‘select’ command, so the script result is the 
name of the command plus the name of the object 
selected. Tablet and dial movements, the really high 
bandwidth events, are dealt with by considering them to 
be inputs to fancy ‘get’ functions. A common use for 
these events, for example, is in the construction of 
positioning matrices. When the function which collects 
these events into a matrix returns, it emits an appropriate 
script representation just like the other ‘get’ functions. 

Two approaches to script editing have been developed. 
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The first is the compromise adopted by Emacs and Em: 
dump the script into a file, invoke a text editor on the file 
and read the file back in. This was implemented as a first 
cut because it was easy. It also has the advantage of be¬ 
ing editor-independent. 

The more ambitious script editing scheme, still under 
development, is to open a text editor as a concurrent 
process. The editor provides two windows, one an inter¬ 
active channel to the normal terminal i/o of the modeler 
process, and the other a buffer for the script to be edited. 
Script emitted from the modeler is sent to the editor’s 
script buffer and inserted at the current cursor location of 
that buffer. Hence, the commands of the text editor are 
available in the script window and the commands of the 
modeler are available in the other. The drawbacks of this 
approach are that all the terminal data for the modeler 
must go through the editor (a performance issue), and, 
more importantly, only one editor (namely Emacs) can 
support this mode of operation. If the window system 
were below the process level, things would be much easier, 
but unfortunately, our window system is part of the 
modeler process. 

4. Discussion and Conclusions 

Two major improvements are planned for the Gem 
user interface. The first is to reimplement the upper level 
of the interface in Lisp [12]. This would eliminate many 
of the problems that arise because of the lack of a formal 
grammar. The interactive command interface would stay 
the same, but scripting would be done in Lisp. This has 
only recently become practical due to the development of 
a programming environment [14] which supports both C 
and Lisp, and guarantees that the conversion can be done 
incrementally, without discarding the several tens of 
thousands of lines of existing modeling code. 

The second improvement being considered is a provi¬ 
sion for user-configurable device interaction via function 
networks [24]. A function network package has been im¬ 
plemented and tested as a run-time package but has not 
yet been bound into the program. The use of function 
networks at the modeling level as well would provide a 
means of representing the dataflow aspects of a model ex¬ 
plicitly. This would also make the storage of intermediate 
results possible, which, as mentioned earlier, opens the 
door to the optimization of script execution time, provid¬ 
ing one is prepared to pay the price in storage require¬ 
ments. 

The objections raised earlier to text as a script 
representation were based in part on the limited intel¬ 
ligence of current text editors. This objection could pos¬ 
sibly be eliminated through the use of syntax-directed 
editing, a technique which has been under development 
by the programming language community for some time. 
The script form would be text, but the text editor would 
know the syntax of the language so its elements could be 
handled at a reasonable level. 

As conjectured in the introduction, the implemen- 
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tation of a user interface kernel which provides integrated 
support for scripting and interaction has proven difficult, 
but not impossible. The work done so far has begun to 
provide the insights necessary for the implementation of a 
truly useful tool for animation production, and the 
methodology developed seems applicable to a wide range 
of data editing problems. 

5. Addendum: Other Potential Applications 

In addition to modeling and animation for computer 
graphics, another application area that stands to benefit 
greatly from the integration of scripting and interaction is 
music production. Composers should be able to create a 
score in some reasonable frequency content versus time 
notation (using either traditional staves or some alter¬ 
native notation [18]), play the score through some instru¬ 
ment, play the instrument to generate a score, and finally 
edit the score using either instrumental or scripted input. 
In fact, a first cut at a system with these capabilities can 
be assembled with current technology [19, 20, 21, 22]. 

Another area that uses hybrid scripting/interaction 
techniques already is robot programming. Unimation 
Puma series robots [23], for example, come with a pro¬ 
gramming language for scripting and a manual control 
box for interactively positioning the robot in key con¬ 
figurations. Once a configuration is set, a line of text can 
be inserted in the program buffer with a press of a button 
on the manual control. Such a programming environment 
could be much more generally useful if fleshed out with 
the remainder of the capabilities described in this paper. 
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SURVEY OF TEXTURE MAPPING 
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ABSTRACT 

Texture mapping is one of the most successful new techniques in high qual¬ 
ity image synthesis. Its use can enhance the visual richness of raster scan 
images immensely while entailing only a relatively small increase in compu¬ 
tation. The technique has been applied to a number of surface attributes: 
surface color, surface normal, specularity, transparency, illumination, and 
surface displacement, to name a few. Although the list is potentially end¬ 
less, the techniques of texture mapping are essentially the same in all cases. 
We will survey the fundamentals of texture mapping, which can be split into 
two topics: the geometric mapping which warps a texture onto a surface, 
and the filtering which is necessary in order to avoid aliasing. An extensive 
bibliography is included. 

KEYWORDS: texture mapping, texture filter, space variant filter, antialias¬ 
ing. 


INTRODUCTION 
Why Map Texture? 

In the quest for more realistic imagery, one of the most frequent criticisms of 
early synthesized raster images was the extreme smoothness of surfaces - 
they showed no texture, bumps, scratches, dirt, or fingerprints. Realism 
demands complexity, or at least the appearance of complexity. Texture 
mapping is a relatively efficient means to create the appearance of complex¬ 
ity without the tedium of modeling and rendering every 3-D detail of a sur¬ 
face. 

The study of texture mapping is valuable because its methods are applicable 
throughout computer graphics and image processing. Geometric mappings 
are relevant to the modeling of parametric surfaces in CAD and to general 
2-D image distortions for image restoration and artistic uses. The study of 
texture filters leads into the development of space variant filters, which are 
useful for image processing, artistic effects, depth-of-field simulation, and 
motion blur. 

Definitions 

We define a texture rather loosely: it can be either a texture in the usual 
sense (e.g. cloth, wood, gravel) - a detailed pattern which is repeated many 
times to tile the plane, or more generally, a multidimensional image which is 
mapped to a multidimensional space. The latter definition encompasses 
non-tiling images such as billboards and paintings. 

Texture mapping means the mapping of a function onto a surface in 3-D. 
The domain of the function can be one, two, or three-dimensional, and it can 
be represented either by an array or by a mathematical function. For exam¬ 
ple, a 1-D texture can simulate rock strata; a 2-D texture can represent 
waves, vegetation |Nor82], or surface bumps [Per84]; a 3-D texture can 
represent clouds [Gar85], wood [Pea85], or marble [Per85a]. For our pur¬ 
poses textures will usually be 2-D arrays. 

The source image ( texture ) is mapped onto a surface in 3-D object space 
which is then mapped to the destination image (screen) by the viewing pro¬ 
jection. Texture space is labeled (u,v), object space is (x 0 ,y 0 ,z 0 ), and 
screen space is (x ,y ). 
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We assume the reader is familiar with the terminology of 3-D raster graphics 
and the issues of antialiasing [Rog85], [Fol82]. 

Uses for Texture Mapping 

The possible uses for mapped texture are myriad. Some of the parameters 
which have been texture mapped to date are: surface color (the most com¬ 
mon use) [Cat74], specular reflection [Bli76], normal vector perturbation 
(“bump mapping”) [Bli78a], specularity (the glossiness coefficient) 
[Bli78b], transparency [Smi79], diffuse reflection [Mil84], surface displace¬ 
ment and mixing coefficients [Coo84b]. 

Illumination Mapping 

Mapping specular and diffuse reflection is rather different from mapping 
other parameters, since these maps are not associated with a particular object 
in the scene, but to an imaginary infinite radius sphere, cylinder, or cube sur¬ 
rounding the scene [Gre86a]. Whereas standard texture maps are indexed by 
the surface parameters u and v, a specular reflection map.is indexed by the 
reflected ray direction [Bli76] and the diffuse reflection map is indexed by 
the surface normal direction [Mil84]. The technique can be generalized for 
transparency as well, indexing by the refracted ray direction [Kay79]. In the 
special case that all surfaces have the same reflectance and they are viewed 
orthographically the total reflected intensity is a function of surface orienta¬ 
tion only, so the diffuse and specular maps can be merged into one [Hor81]. 

Illumination mapping , as these techniques are called, facilitates the simula¬ 
tion of complex lighting environments, since the time required to shade a 
point is independent of the number of light sources. Other reasons for its 
recent popularity are: it is one of the few demonstrated techniques for 
antialiasing highlights [Wil83], it is an inexpensive approximation to ray 
tracing for mirror reflection, and to radiosity methods [Gor84] for diffuse 
reflection of objects in the environment. Efficient filtering is especially 
important for illumination mapping, where variations in surface curvature 
often necessitate broad areas of the sky to be averaged. 

Since specular reflection varies as a function of the viewing direction, it is 
most conveniently computed on the fly, as in ray tracing. Diffuse reflection 
of the environment, however, has not yielded to ray tracing even when 
stochastic methods [Coo84a] are used. The problem is that diffuse reflection 
scatters light over an entire hemisphere, not a narrow cone, as does specular 
reflection. Fortunately diffuse reflection is independent of viewing direction, 
so the incident illumination at each surface point can be precomputed and 
treated as a texture [Coo84b]. Previous methods have approximated this 
using polygon subdivision to model hard shadows [Ath78], soft shadows 
[Nis83], beams of light [Hec84], or indirect illumination [Gor84]. With the 
development of more efficient algorithms for its computation, incident 
illumination promises to be a common use for textures in the future. 

Even when direct support feu- illumination mapping is unavailable, tricks can 
be employed which give a visually acceptable approximation. Rather than 
calculate the exact ray direction at each pixel, one can compute the reflected 
or refracted ray direction only at polygon vertices and interpolate it, in the 
form of u and v texture indices, across the polygon using standard methods. 
This approximation is similar to that made by beam tracing [Hec84]. 


t Current address: Pacific Data Images, 1111 Karlstad Dr., Sunnyvale, CA 94089, USA 
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MAPPING 

The mapping from texture space to N screen space is split into two phases. 
First is the surface parameterization which maps texture space to object 
space, followed by the standard modeling and viewing transformations 
which map object space to screen space, typically with a perspective projec¬ 
tion [Fol82]. These two mappings are composed to find the overall 2-D tex¬ 
ture space to 2-D screen space mapping, and the intermediate 3-D space is 
often forgotten. This simplification suggests texture mapping’s close ties 
with image warping and geometric distortion. 

Scanning Order 

There are three general approaches to drawing a texture mapped surface: 
scanning in screen space, scanning in texture space, and two-pass methods. 

Traversing the screen in scanline order, sometimes called inverse mapping, 
is the most common method. For each pixel in screen space the preimage of 
the pixel in texture space is found and this area is filtered. This method is 
preferable when the screen must be written sequentially (e.g. when output is 
going to a film recorder), the mapping is readily invertible, and the texture is 
random access. 

Traversing the texture in scanline order may seem simpler than scanning the 
screen since inverting the mapping is unnecessary in this case, but doing this 
correctly is subtle. Unfortunately, uniform sampling of texture space does 
not guarantee uniform sampling of screen space except for affine (linear) 
mappings, so for non-affine mappings texture subdivision must often be 
done adaptively. Otherwise, holes or overlaps will result in screen space. 
Scanning the texture is preferable when either (a) the texture to screen map¬ 
ping is difficult to invert, or (b) the texture image must be read sequentially 
(e.g. from tape) and will not fit in random access memory. 

Two-pass methods decompose a 2-D mapping into two 1-D mappings, the 
first applied to the rows of an image and the second applied to the columns 
[Cat80]. These methods work particularly well for affine and perspective 
mappings, where the warps for each pass are linear or rational linear func¬ 
tions. Because the mapping and filter are 1-D they are amenable to stream 
processing techniques such as pipelining. Two-pass methods are preferable 
when the source image cannot be random accessed but it has rapid row 
column access, and a buffer for the intermediate image is available. 

Parameterization 

In order to map a 2-D texture onto a surface in 3-D, a parameterization of 
the surface is needed. This comes naturally for surfaces which are defined 
parametrically, such as bicubic patches, but less naturally for other surfaces 
such as polygons and quadrics, which are usually defined implicitly. The 
parameterization can be by surface coordinates u and v, as in standard tex¬ 
ture mapping, by the direction of a normal vector or light ray, as in illumina¬ 
tion mapping, or by spatial coordinates x 0 ,y 0 , and z 0 for objects which are 
to appear carved out of a solid material. 

Parameterizing Planes and Polygons 

We will examine mappings for planar polygons in some detail. First we dis¬ 
cuss the parameterization and later we discuss the composite mapping. 

A triangle is easily parameterized by specifying the texture space coordi¬ 
nates (u,v) at each of its three vertices. This defines an affine mapping 
between texture space and 3-D object space; each of x 0 , y 0 , and z Q have the 
form Au+Bv+C. For polygons with more than three sides, nonlinear func¬ 
tions are needed in general, and one must decide if the flexibility is worth the 
expense. The alternative is to assume linear parameterizations, and subdi¬ 
vide into triangles where necessary. 

One nonlinear parameterization which is sometimes used is the bilinear 
patch: 

A E I 
B F J 

[x 0 y 0 2 0 ] = [uv u v 1] c q K 
DHL 

which maps rectangles to planar or nonplanar quadrilaterals [Hou83]. This 
parameterization has the strange property that it preserves lines and equal 
spacing along vertical and horizontal texture axes, but preserves neither 
along diagonals. The use of this parameterization for planar quadrilaterals is 
not recommended, however, since inverting it requires the solution of 


quadratic equations. 

A better parameterization for planar quadrilaterals is the ‘perspective map¬ 
ping* [Hec83]: 

A D G J 

[x 0 w 0 yo w o z o w o W J = [ W v 1] B E H K 

C F / L. 

where w Q is the homogeneous coordinate which is divided through to calcu¬ 
late the true object space coordinates [Rob66], [Fol82]. x 0 , y 0 , and z 0 are 
thus of the form (Au+Bv+C)/(Ju+Kv+L). The perspective mapping 
preserves lines at all orientations but sacrifices equal spacing. Note that a 
‘perspective mapping’ might be used for the parameterization whether or not 
the viewing projection is perspective. 

Projecting Polygons 
Orthographic Projection 

Orthographic projections of linearly-parameterized planar textures have a 
linear composite mapping. The inverse of this mapping is linear as well, of 
course. This makes them particularly easy to scan in screen order the cost 
is only two adds per pixel, disregarding filtering [Smi80]. 

It is also possible to perform affine mappings by scanning the texture, pro¬ 
ducing the screen image in non-scanline order. Most of these methods are 
quite ingenious. 

Braccini and Marino show that by depositing the pixels of a texture scanline 
along the path of a Bresenham digital line, an image can be rotated or 
sheared [Bra80]. To fill the holes which sometimes result between adjacent 
lines, they draw an extra pixel at each kink in the line. This results in some 
redundancy. They also use Bresenham’s algorithm [Bre65] in a totally dif¬ 
ferent way: to scale an image. This is possible because distributing m 
source pixels to n screen pixels is analogous to drawing a line with slope 
n !m . Braccini and Marino use the simplest filtering: point sampling. 

Weiman also uses Bresenham’s algorithm for scaling, but does not draw 
diagonally across the screen [Wei80]. Instead he decomposes rotation into 
four scanline operations: xscale, yscale, xshear, and yshear. He does box 
filtering by averaging together several phases of the scaled image. 

Cohen draws texture scanlines diagonally across the screen like [Bra80], but 
does not use their scaling trick [Coh84]. He is able, however, to eliminate 
the holes and redundancy of [Bra80] by carefully nesting the digital lines. 
Cohen also demonstrates the algorithm’s applicability to antialiased line 
drawing. 

Perspective Projection 

A naive method for texture mapping in perspective is to linearly interpolate 
the texture coordinates u and v along the sides of the polygon and across 
each scan line, much as Gouraud or Phong shading [Fol82] is done. How¬ 
ever, linear interpolation will never give the proper effect of nonlinear 
foreshortening [Smi80], it is not rotationally invariant, and the error is obvi¬ 
ous in animation. One solution is to subdivide each polygon into many small 
ones. The correct solution, however, is to replace linear interpolation with 
the true formula, which requires a division at each pixel In fact, Gouraud 
and Phong shading in perspective, which are usually implemented with 
linear interpolation, share the same problem, but the errors are so slight that 
they’re rarely noticed. 

Perspective mapping of an affine or perspective parameterized plane is: 

A D G 

[xw yw w] = [u v 1] B E H 
C F /. 

This mapping is analogous to the more familiar 3-D perspective transforma¬ 
tion using 4x4 homogeneous matrices. The inverse of this mapping (calcu¬ 
lated using the adjoint matrix) is of the same form, as is the composition of 
two of these mappings. Consequently a plane using a perspective parame¬ 
terization which is viewed in perspective will have compound mapping 
which is of the perspective form. The perspective mapping simplifies to the 
affine form when G and H are zero, which occurs when the surface is paral¬ 
lel to the projection plane. 

Aoki and Levine demonstrate texture mapping polygons in perspective using 
formulas equivalent to the above [Aok78]. Smith proves that the division is 
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necessary in general, and shows how u and v can be calculated incremen¬ 
tally from x and y as a polygon is scanned [Smi80]. As discussed earlier, 
Catmull and Smith decompose perspective mappings into two passes of 
shears and scales [Cat80]. Gangnet, Pemy, and Coueignoux explore an 
alternate decomposition which rotates screen and texture space so that the 
perspective occurs along one of the image axes [Gan82]. Heckbert promotes 
the homogeneous matrix notation for perspective texture mapping and 
discusses incremental techniques for scanning in screen space [Hec83]. 

Patches 

Texture mapping is quite popular for surfaces modeled from patches, prob¬ 
ably for two reasons: (a) the parameterization comes for free, (b) the cost of 
texture mapping is small relative to the cost of patch rendering. Patches are 
usually rendered using a subdivision algorithm whereby screen and texture 
space areas are subdivided in parallel [Cat74], [Lan80]. As an alternative 
technique Catmull and Smith demonstrate, theoretically at least, that it is 
possible to perform texture mapping on bilinear, biquadratic, and bicubic 
patches with two-pass algorithms [Cat80]. Fraser, Schowengerdt, and 
Briggs explore a similar method for the application of geometric image dis¬ 
tortions [Fra85]. 


FILTERING 

After the mapping is computed and the texture warped, the image must be 
resampled on the screen grid. This process is called filtering. 

The cheapest texture filtering method is point sampling, wherein the pixel 
nearest the desired sample point is used. It works relatively well on unsealed 
images, but for stretched images the texture pixels are visible as large blocks 
and for shrunk images aliasing can cause distracting moire patterns. 

Aliasing 

Aliasing can result when a signal has unreproducible high frequencies 
[Cro77], [Whi81]. In texture mapping, it is most noticeable on high contrast, 
high frequency textures. Rather than accept the aliasing which results from 
point sampling or avoid those models which exhibit it, we prefer a high 
quality, robust image synthesis system which does the extra work required to 
eliminate it. In practice, total eradication of aliasing is often impractical and 
we must settle for approximations which merely reduce it n unobjectionable 
levels. 

Two approaches to the aliasing problem are: 

a) Point sample at higher resolution 

b) Low pass filter before sampling 

The first method theoretically implies sampling at a resolution determined by 
the highest frequencies present in the image. Since a surface viewed 
obliquely can create arbitrarily high frequencies, this resolution can be 
extremely high. It is therefore desirable to limit dense supersampling to 
regions of high frequency and high contrast [Cro82] by adapting the sam¬ 
pling rate to the local intensity variance [Lee85], [Dip85]. This is not possi¬ 
ble, however, in vectorized algorithms, which must choose a uniform sam¬ 
pling rate a priori and accept any residual aliasing. Whether adaptive or uni¬ 
form point sampling are used, stochastic sampling can improve the appear¬ 
ance of images significantly by trading off aliasing for noise [C 0086 ]. 

The second method, low pass filtering before sampling, is preferable because 
it addresses the causes of aliasing rather than its symptoms. To eliminate 
aliasing out signals must be band-limited (contain no frequencies above the 
Nyquist limit). When a signal is warped and resampled the following steps 
must theoretically be performed [Smi83]: 

1 . reconstruct continuous signal from input samples by convolution 

2 . warp the abscissa of the signal 

3. low pass filter the signal using convolution 

4. resample the signal at the output sample points 

'niese methods are well understood for linear warps, where the theory of 
linear systems lends support, but for nonlinear warps such as perspective the 
theory is lacking and a number of approximate methods have sprung up. 

Space Invariant Filtering 

For affine image warps the filter is space invariant ; the filter kernel remains 
constant as it moves across the image. The four steps above simplify to: 

1 . low pass filter the input signal using convolution 

2 . warp the abscissa of the signal 


3. resample the signal at the output sample points 
Space invariant convolutions are often done using an FFT, multiply, and 
inverse FFT [Opp75]. The cost of this operation is independent of the kernel 
size. 

Direct Convolution 

Nonlinear mappings have space-variant* filter kernels (in texture space), 
which require more complex filtering methods. In general, a square screen 
pixel which intersects a curved surface has a curvilinear quadrilateral preim¬ 
age in texture space. Most methods approximate the true mapping by the 
locally tangent perspective or linear mapping, so that the curvilinear preim¬ 
age is approximated by a quadrilateral or parallelogram. In place of the 
ideal low pass filter, a sine , a finite impulse response (FIR) approximation is 
used to form a weighted average of texture samples. 

We now summarize several direct convolution texture filters. 

Catmull, 1974 

In his subdivision patch tenderer, Catmull computes an unweighted average 
of the texture pixels corresponding to each screen pixel [Cat74]. He gives 
few details, but it appears his filter is a quadrilateral with a box kernel cross 
section. 

Blinn and Newell, 1976 

Blinn and Newell improve on this with a triangular kernel which forms over¬ 
lapping square pyramids 2 pixels wide in screen space [Bli76]. At each 
pixel the pyramid is distorted to fit the approximating parallelogram in tex¬ 
ture space, and a weighted average is computed. 

Feibush, Levoy, and Cook, 1980 

The filter used by Feibush, Levoy, and Cook is more elaborate [Fei80]. 

The following steps are taken at each screen pixel: 

(1) Center the kernel (box, cylinder, cone, or gaussian) on the pixel and 
find its bounding rectangle. 

(2) Transform the rectangle to texture space, where it is warped into a 
quadrilateral. The sides of this quadrilateral are assumed to be 
straight Find a bounding rectangle for this quadrilateral. 

(3) Map all pixels inside the texture space rectangle to screen space. 

(4) Form a weighted average of the mapped texture pixels using a two- 
dimensional lookup table indexed by each sample’s location within the 
pixel. 

Since the kernel is in lookup table it can be a gaussian or other high quality 
filter. 

Gangnet, Pemy, and Coueignoux, 1982 

The texture filter proposed by Gangnet, Pemy, and Coueignoux is quite simi¬ 
lar to [Fei80], but they subdivide uniformly in screen space rather than tex¬ 
ture space [Gan82]. 

Pixels are assumed circular and overlapping. The preimage of a screen cir¬ 
cle is a texture ellipse whose major axis corresponds to the direction of 
greatest compression. A square intermediate supersampling grid which is 
oriented orthogonally to the screen is constructed. The supersampling rate is 
determined from the longest diagonal of the parallelogram approximating 
the texture ellipse. Each of the sample points on the intermediate grid is 
mapped to texture space and bilinear inteipolation is used to reconstruct the 
texture values at these sample points. The texture values are then weighted 
by a truncated sine 2 pixels wide in screen space and summed. 

The paper contrasts [Fei80]’s “back transforming” method with [Gan82]’s 
“direct transforming” method, claiming that the latter produces more accu¬ 
rate results because the sampling grid is in screen space rather than texture 
space. Other differences are more significant For example, [Gan82] 
requires a bilinear interpolation for each sample point while [Fei80] does 
not Also, [Gan82] samples at an unnecessarily high frequency along the 
minor axis of the texture ellipse. For these two reasons, [Fei80] is probably 
faster than [Gan82] (an estimate denied in [Gan84]). 


Graphics Interface ’86 Vision Interface ’86 



210 


Greene and Heckbert, 1986 

The elliptical weighted average filter (EWA) proposed by Heckbert 
[Gre86b] is similar to [Gan82] in that it assumes overlapping circular pixels 
which map to arbitrarily oriented ellipses, and like [Fei80] because the ker¬ 
nel is stored in lookup table, but instead of mapping texture pixels to screen 
space, the kernel is mapped to texture space, as in [Bli76]. The kernel, a cir¬ 
cularly symmetric function in screen space, is warped by an elliptic para¬ 
boloid function into an ellipse in texture space. The elliptic paraboloid is 
computed incrementally and used for both ellipse inclusion testing and ker¬ 
nel table index. The cost per texture pixel is just a few arithmetic operations, 
in contrast to [Fei80] and [Gan82], which both require mapping each texture 
pixel from texture space to screen space or vice-versa. 

Comparison of Direct Convolution Methods 

All five methods have a cost per screen pixel proportional to the number of 
texture pixels accessed, and this cost is highest for [Fei80] and [Gan82]. 
Since [Gre86b] has quality comparable to [Fei80] and [Gan82] at much 
lower cost, it appears to be the fastest algorithm for high quality direct con¬ 
volution. 

Prefiltering the Texture 

Even with optimization, the methods above are often extremely slow, since a 
pixel preimage can be arbitrarily large along silhouettes or at the horizon. 
We would prefer a texture filter whose cost does not grow proportionately to 
texture area. 

To speed up the process the texture can be prefiltered so that during render¬ 
ing only a few samples will be accessed for each screen pixel. The access 
cost of the filter will thus be constant, unlike direct convolution methods. 
Two data structures can be used for prefiltering: image pyramids and 
integrated arrays. 

Pyramidal data structures are commonly used in image processing and com¬ 
puter vision [Tan75], [Ros84]. Their application to texture mapping was 
apparently first proposed in CatmulTs PhD work [Smi79]. 

We now summarize several texture filters which employ prefiltering. 

Dungan, Stenger, and Sutty, 1978 

Dungan, Stenger, and Sutty prefilter their texture "tiles” to form a pyramid 
whose resolutions are powers of two [Dun78]. To filter an elliptical texture 
area one of the pyramid levels is selected based on the average diameter of 
the ellipse and the level is point sampled. The memory cost for this type of 
texture pyramid is 1 +1/4 +1/16+- = 4/3 times that required for an 
unfiltered texture; only 33% more expensive. 

Smith, 1979 

Smith describes the "mipmap”, which is a particular layout fen* color image 
pyramids invented by Williams [Smi79]. Smith points out that the square 
filter area inherent in pyramids is inaccurate if the pixel preimage is 
elongated. 

Heckbert, 1983 

Heckbert describes Williams’ trilinear interpolation scheme for pyramids 
(see below) and its efficient use in perspective texture mapping of polygons 
[Hec83]. Choosing the pyramid level is equivalent to approximating a tex¬ 
ture quadrilateral with a square. The recommended formula for the diameter 
d of the square is the maximum of the side lengths of the quadrilateral. 
Aliasing results if the area filtered is too small, and blurring results if it’s too 
big; one of these two is inevitable. 

Williams, 1983 

Williams improves upon [Dun78] by proposing a trilinear interpolation 
scheme for pyramidal images wherein bilinear interpolation is performed on 
two levels of the pyramid and linear interpolation is performed between 
them [Wil83]. The output of this filter is thus a continuous function of posi¬ 
tion (u,v) and diameter d. His filter has a constant cost of 8 pixel accesses 
and 7 multiplies per screen pixel. Williams uses a box filter to construct the 
image pyramid, but gaussian filters can also be used [Bur81]. 


Gang net and Ghazanfarpour, 1984 

In Gangnet and Ghazanfarpour’s survey a variation on the image pyramid i{ 
proposed which allows unequal filtering in u and v (they call it "asymmetri¬ 
cal” filtering, but is more properly termed "anisotropic”). The image is 
prefiltered to resolutions of the form 2 A “x2 Av , so this pyramid is four dimen¬ 
sional: u , v, Au and Av. Its memory requirements are four times that of an 
unfiltered image, three times that of an isotropic pyramid, and the time cost 
is 16 texture pixel accesses and 15 multiplies per screen pixel. 

Greene and Heckbert, 1986 

Attempting to decouple the data structure from the access function, Greene 
suggests the use of the EWA filter on an image pyramid [Gre86b]. Unlike 
the other prefiltering techniques such as trilinear interpolation on a pyramid 
or the summed area table, EWA allows arbitrarily oriented ellipses to be 
filtered. 

Crow, 1984 

Crow proposes the summed area table , an alternative to the pyramidal filter¬ 
ing of [Dun78] and [Wil83], which allows orthogonally oriented rectangular 
areas to be filtered in constant time [Cro84]. The original texture is prein¬ 
tegrated in the u and v directions and stored in a high-precision summed 
area table. To filter a rectangular area the table is sampled in four places 
(much as one evaluates a definite integral by sampling an indefinite integral). 
To do this without artifacts requires 16 accesses and 14 multiplies in general, 
but there is an optimization for large areas which cuts the cost to 4 accesses 
and 2 multiplies. The high-precision table requires 4 times the memory cost 
of the original image. The summed area table is generally more costly than 
the texture pyramid in both memory and time, but it can perform better filter¬ 
ing than the pyramid, since it filters rectangular areas, not just squares. It 
clearly outperforms the four-dimensional pyramid in [Gan84]. 

Perlin, 1985 

Perlin’s selective image filter is an elegant generalization of [Cro84], 
developed independently [Per85b]. If an image is preintegrated in u and v 
n times, an orthogonally oriented elliptical area can be filtered by sampling 
the array at (n+1) 2 points and weighting them appropriately. The effective 
kernel is a box convolved with itself n times whose size can be selected at 
each screen pixel. If n =0 the method degenerates to point sampling, if n =I 
it is equivalent to the summed area table with its box kernel, n=2 uses a tri¬ 
angular kernel, and n=3 uses a parabolic kernel. With increasing n the ker¬ 
nel approaches a gaussian, and the memory and time costs increase. 


Comparison of Prefiltering Methods 

The following table summarizes the prefiltering methods we have discussed: 


REF. 

KERNEL 

SHAPE 

DOF 

TIME 

MEMORY 

ptsamp 

impulse 

point 

2 

1,0 

1 

Dun78 

box 

square 

3 

1,0 

1.33 

Wil83 

box 

square 

3 

8,7 

1.33 

Gan84 

box 

rectangle 

4 

16,15 

4 

Gre86b 

any 

ellipse 

5 

? 

1.33 

Cro84 

box 

rectangle 

4 

16,14 or 4,2 

4 

Per85b 

triangle 

ellipse 

4 

36,31 or 9,4 

6 


The pair of numbers under ‘time* is the number of texture pixel accesses and 
the number of multiplies per screen pixel. The DOF (degrees of freedom) of 
the filter provides an approximate ranking of filter quality; the more degrees 
of freedom are available the greater is the kernel shape control. 

We see that the integrated array techniques [Cro84] and [Per85b] have rather 
high memory costs relative to the pyramid methods, but allow rectangular or 
orthogonally oriented elliptical areas to be filtered. Traditionally pyramid 
techniques have lower memory cost but allow only squares to be filtered. 
Since prefiltering usually entails a setup expense proportional to the square 
of the texture resolution, its cost is of the same order as direct convolution - 
if the texture is only used once. But if the texture is used many times, as part 
of a periodic pattern, or appearing on several objects or in several frames of 
animation, the setup cost can be amortized over each use. 
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Filtering in Frequency Space 

An alternative to texture space filtering is to transform the texture to fre¬ 
quency space and low pass fitter its spectrum. This is most convenient when 
the texture is represented by a Fourier series rather than a texture array. 
Norton, Rockwood, and Skolmoski explore this approach for flight simulator 
applications and propose a simple technique for clamping high frequency 
terms [Nor82]. Gardner employs 3-D Fourier series as a transparency tex¬ 
ture function, with which he generates suiprisingly convincing pictures of 
trees and clouds [Gar85]. 

Perlin’s “Image Synthesizer” uses band limited pseudo-random functions as 
texture primitives [Per85a]. Creating textures in this way eases transitions 
from macroscopic to microscopic views of a surface; in the macroscopic 
range the surface characteristics are built into the scattering statistics of the 
illumination model, in the intermediate range they are modeled using bump 
mapping, and in the microscopic range the surface is explicit geometry 
[Per84]. Each term in the texture series can make the transition indepen¬ 
dently at a scale appropriate to its frequency range. 

Filtering Recommendations 

The best filtering algorithm for a given task depends on the texture represen¬ 
tation and scanning order in use. When filtering a texture array in a screen 
order rendering system, the EWA filter [Gre 86 b], summed area table 
[Cro84], or selective image filter [Per85b] are recommended because of their 
good shape control and high speed. Since the above algorithms are still 
under development and the EWA filter has yet to be tested chi a pyramid, it is 
too early to make definitive judgements. When the texture is a fourier series, 
filtering is simply a matter of clamping or truncating the high frequency 
terms [Nor82]. In the case of arbitrary texture functions, which can be much 
harder to integrate than texture arrays, adaptive stochastic sampling methods 
are called for [Dip85]. Two-pass algorithms require 1-D space variant tex¬ 
ture filters. 

Future research on texture filters will continue to improve their quality by 
providing greater kernel shape control while retaining low time and memory 
costs. One would like to find a constant-cost prefiltering method which 
filters arbitrarily oriented elliptical areas using a gaussian kernel. 

CONCLUSIONS 

System Support for Texture Mapping 

So far we have emphasized those tasks common to all types of texture map¬ 
ping. We now summarize some of the special provisions which a modeling 
and rendering system must make in order to support different varieties of 
texture mapping. 

The primary requirements of standard texture mapping are texture space 
coordinates (u,v) for each screen pixel plus the partial derivatives of u and 
v with respect to screen x and y for good antialiasing (assuming that the 
rendering program is scanning in screen space). 

Bump mapping requires additional information at each pixel: two vectors 
tangent to the surface pointing in the u and v directions. For facet shaded 
polygons these tangents are constant across the polygon, but for Phong 
shaded polygons [Fol82] they vary. In order to ensure artifact-free bump 
mapping on Phong shaded polygons, these tangents must be continuous 
across polygon seams. One way to guarantee this is to compute tangents at 
all polygon vertices during model preparation and interpolate them across 
the polygon [Max 86 ]. The normal vector can be computed as the cross pro¬ 
duct of the tangents. 

Proper antialiasing of illumination mapping requires some measure of sur¬ 
face curvature in order to calculate the solid angle of sky to filter. This is 
usually provided in the form of the partials of the normal vector with respect 
to screen space. 

Although they are usually much more compact than brute force 3-D model¬ 
ing of surface details, texture maps can be bulky, especially when they 
represent a high resolution image as opposed to a low resolution texture pat¬ 
tern which is replicated numerous times. Keeping several of these in ran¬ 
dom access memory is often a burden on the rendering program. This prob¬ 
lem is especially acute for rendering algorithms which generate the image in 
scanline order rather than object order, since a given scan line could access 
hundreds of texture maps. Further work is needed on memory management 
for texture map access. 
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General 

Texture mapping has become a widely used technique because of its gen¬ 
erality and efficiency. It has even made its way into everyday broadcast TV, 
thanks to new real-time video texture mapping hardware such as the Ampex 
ADO and Quantel Mirage. Rendering systems of the near future will allow 
any conceivable surface parameter to be texture mapped. Despite the recent 
explosion of diverse applications for texture mapping, a common set of fun¬ 
damental concepts and algorithms is emerging. We have surveyed a number 
of these fundamentals: alternative techniques for parameterization, scan¬ 
ning, texture representation, direct convolution and prefiltering. 
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Abstract 

The concept of keyframe-based subactor attempts to span 
two major types of animation: parametric keyframe animation 
and algorithmic animation. In a keyframe-based subactor, all 
parameter values are defined by interpolation, however, if 
there is a law defined for one parameter, this law is applied 
and values computed by interpolation are ignored. The 
application of keyframe-based subactors to human motion is 
also discussed. 

keywords: subactor, procedural law, parametric keyframe 
animation, algorithmic animation 

Resume 

On introduit le concept de sous-acteur base sur des dessins- 
cl£s pour tenter de concilier deux types principaux 
d'animation: l'animation parametrique k dessins-cl6s et 
l'animation algorithmique. Dans un sous-acteur bas6 sur des 
dessins-clds, toutes les valeurs de param£tres sont ddfinies par 
interpolation; cependant si une loi est d£finie pour un 
param&tre, cette loi s’applique et les valeurs calculdes par 
interpolation sont ignordes. On ddcrit aussi une application de 
ces sous-acteurs basds sur les dessins-clds dans le domaine de 
l'animation de personnages tridimensionnels. 

mots-cles: sous-acteur, loi procddurale, animation 
paramdtrique k dessins-clds, animation 
algorithmique 


Introduction 

There have been two major approaches in the design of 
animation control (Steketee and Badler 1985; Parke 1982; 
Zeltzer 1985; Magnenat-Thalmann and Thalmann 1985): key- 
frame animation and algorithmic animation. The concept of 
keyframe-based subactor attempts to span both types of 
animation. 

Keyframe animation consists of the automatic generation of 
intermediate frames, called inbetweens, based on a set of 
keyframes supplied by the animator. There are two 
fundamental approaches to keyframe animation: shape- 
interpolation and parametric keyframe animation. 

Shape interpolation is the three-dimensional analog of two- 
dimensional key framing, introduced by Burtnyk and We in 
(1971). Inbetween frames are computed by interpolating 
between the data points of the two objects. 

In parametric keyframe animation systems (Steketee and 
Badler 1985; Parke 1982) inbetween frames are generated by 
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interpolating the transformation parameters and transforming 
objects. 

In algorithmic animation, motion is algorithmically 
described. Physical laws are applied to parameters of the 
objects. Control of these laws may be given by programming 
as in AS AS (Reynolds 1982) and MIRA (Magnenat- 
Thalmann and Thalmann 1983) or using an interactive 
director-oriented approach as in the MIRANIM (Magnenat- 
Thalmann et al 1985) system. With such an approach, any 
kind of law may be applied to the parameters. For example, 
the variation of a joint angle may be controlled by kinematic 
laws as well as laws based on dynamic analysis (Badler 1984; 
Armstrong and Green 1985; Wilhelms and Barsky 1985). 

In keyframe animation, there are often undesirable effects 
such as lack of smoothness and discontinuities in motion. To 
reduce these effects, alternate methods to a linear interpolation 
have been proposed by Baecker (1969), Burtnyk and Wein 
(1976), Reeves (1981), Kochanek and Bartels (1984) 
However, according to Stekettee and Badler (1985), with 
shape interpolation, there is no totally satisfactory solution to 
the deviations between the interpolated image and the object 
being modeled. Unless animators spend their time to digitize 
almost each frame. 

Algorithmic animation is an excellent approach for most 
motions, however it tends to be complex for specifying 
human motions. Moreover, kinematic laws may be sometimes 
completely unrealistic and laws based on dynamic analysis are 
generally very expensive. 

The concept of keyframe-based subactor 

An actor as defined by Reynolds (1982) is a graphical entity 
with a given role to play. A subactor (Magnenat-Thalmann 
and Thalmann, 1985b) is an entity which is dependent on an 
actor. This means that all motions applied to an actor are also 
applied to all its subactors. The reverse is not true. There are 
also two other advantages to the subactor approach: 

1 .Any new subactor may be inserted as dependent on an 
existing actor. 

2. Motions of different subactors may be coordinated and 
synchronized within an actor. 

A subactor is a variable of type subactor, which is a data 
abstraction formulation of a class of entities composed of 
objects and internal transformations applied to them. Formally 
a subactor communicates with other entities by means of 
parameters, which may be time-dependent. 

In a keyframe-based subactor, all parameter values are 
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defined by interpolation, however, if there is a law defined 
for one parameter, this law is applied and values computed by 
interpolation are ignored. This approach has great advantages. 
Most of the parameters may be controlled by the keyframe 
process, which is less expensive in fact; however more 
realistic effects may be performed on selected parameters. 

Keyframe-based subactors have been implemented as an 
extension of the MIRANIM system. 

Subactors in the MIRANIM system 

MIRANIM is an advanced system which allows the 
creation, manipulation and animation of realistic images. The 
most important features of MIRANIM are as follows: 

- basic geometric primitives 

- ruled and free-form surfaces 

- multiple cameras and stereoscopy 

- actor motions 

- multiple lights and spots, shadows (Magnenat-Thalmann 
and Thalmann, 1985) 

- transparency, three-dimensional texture, fractals, particle 
systems 

Image rendering may be performed by a scanline z-buffer 
algorithm or a ray tracing algorithm. 

MIRANIM is mainly based on three components: 

1) the object modelling and image synthesis system BODY¬ 
BUILDING 

2) the director-oriented animation editor ANIMEDIT 

3) the actor-based sublanguage CINEMIRA-2 

ANIMEDIT is a scripted system; the director designs a 
scene with decors, actors, cameras and lights. Each of these 
entities is driven by animated variables, which are, in fact, 
state variables following evolution laws. CINEMIRA-2 
allows the director to use programmers to extend the system. 
The great advantage of this is that the system is extended in a 
user-friendly way. This means that the director may 
immediately use the new possibilities. An entity programmed 
in CINEMIRA-2 is directly accessible in ANIMEDIT. This 
not only extends the system, but also enables specific 
environments to be created. For animation, CINEMIRA-2 
allows the programming of five kinds of procedural entities: 
objects, laws of evolution, actor transformations, subactors, 
animation blocks. 

A CINEMIRA-2 subactor is dependent on an actor which 
may be transformed in ANIMEDIT by a list of global 
transformations like translation, rotations, shear, scale, color 
transformation, flexion, traction. Several actors like these may 
participate in the same scene with other actors implemented 
using only algorithmic animation, cameras, lights and decor. 

Application of keyframe-based subactors to 
human motion 

A new system has now been designed and implemented: 
BODY-MOVING; this is a parametric key-frame animation 
system in which human bodies are mainly controlled by joint 
angles. BODY-MOVING is an interactive program that allows 
the user to build any sequence of motion for a given three- 
dimensional character. Actually, motion is controlled by 50 
joint angles. A keyframe is specified by modifying values for 
these angles from the previous keyframe in the sequence. 
Corrections may be done vertically for any keyframe, or 
horizontally for a given parameter in each keyframe. The 
animator may look at parameter values for any keyframe or 


interpolated frame. He/she also may obtain a wire-frame view 
of the human bodies for any frame. 

For each parameter, interpolation may be computed linearly 
or using bicubic splines (Kochanek and Bartels 1984). 

Once the motion of the three-dimensional character is 
designed, the character needs to be covered with surfaces. In 
our experimental system, we try to completely separate the 
topology of the surfaces from the wire-frame model. This 
means that parts of the human bodies may be designed using 
ruled surfaces such as revolution surfaces, free-form surfaces 
or three-dimensional reconstructed surfaces obtained from 
digitized projections. The system transforms the surfaces 
according to the wire-frame model assuring an automatic 
continuity between the different surfaces. This 
correspondance is based on a changing of reference systems 
independent of the segment length. This means that for the 
same set of surfaces, several bodies ot different sizes may be 
obtained according to the segment length in the wire-frame 
models. This technique may be considered as a three- 
dimensional skeleton technique (Burtnyk and Wein 1976). 

For example consider a point between the elbow and the 
wrist; when we change the reference system, it is important to 
notice that both parts may be bent and/or twisted. This means 
that the surface must be extended on the external side of the 
elbow and twisted at the wrist, while preserving continuity. 

Integration strategy 

The integration of parametric keyframe animation and 
algorithmic animation has been performed considering that 
any human body designed with BODY-MOVING is a 
subactor in MIRANIM. This subactor has 50 real parameters, 
grouped in 17 three-dimensional vector parameters where 
each component is an angle. Each vector parameter is 
identified by a name; for example LEFTSHOULDER is the 
three-dimensional vector that controls the motion of the 
shoulder. If there is no law defined for this parameter, the 
interpolated values are taken. If there is a law defined for the 
parameter, this law is applied and values computed by 
interpolation are ignored. This approach of keyframe-based 
subactor has great advantages. Most of the angles may be 
controlled by the keyframe process, which is less expensive 
in fact; however more realistic effects may be perfonned on 
selected angles. For example, laws based on dynamics have 
been implemented using similar equations to those described 
by Armstrong and Green (1985). Of course, to obtain an 
angle following a law based on the dynamic analysis, 
dynamic properties like masses, forces, inertia matrices and 
torques have to be supplied. Intrinsic properties of bodies like 
masses and moments of inertia may be given at the creation 
of the surfaces in BODY-BUILDING. Forces and torques 
have to be specified as parameters of the laws. With our 
approach, expensive computations are performed only when 
absolutely necessary. Our approach to the integration of the 
different techniques is as follows: the joint angles must vary 
according to the values calculated by BODY-MOVING, but it 
is also possible to have one or more angles following a 
predefined law or programmed with the CINEMIRA-2 
sublanguage. 

To control algorithmically the evolution of an angle, the 
animator may use three kinds of laws: predefined laws, 
CINEMIRA-2 analytical laws and functions of a previous 
state. In this latter case, an evolution law may be completely 
changed at any time (and consequently at any frame); this 
allows the animator to adapt the evolution of a joint angle to a 
new situation. 
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The integration approach has another important application: 
the relation between MIRANIM actors and human characters 
generated by BODY-MOVING. The typical case is when the 
value of a parameter (angle) of the human character have to be 
derived from data for an actor. For example, the MIRANIM 
actor is a ball and the human character receives the ball on the 
head. In this case, the motion of the character has to be 
controlled using data about the ball. To solve this case, our 
approach is to predefine functions which return at any frame 
the value of any parameter. These laws may be then applied to 
any animated variable which drives the motion of others 
actors, cameras or lights. 

Fig.l shows the integration of human keyframe-based 
subactors into the MIRANIM system. 
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Abstract 

Water is the commonest of everyday substances and 
its many forms have provided subjects for artists for 
as long as art has existed. But in computer graphics, 
there seem to have been few attempts to make 
pictures of water. The reason for this is simple. 
Realistic pictures of water are very hard to produce. 
We examine some of the reasons for this difficulty 
and report on some of our own experiments. 

Background 

Water is all around us and plays a part in many 
natural scenes. But it rarely appears in computer 
graphics images. The last four years has seen an 
enormous increase in interest in modelling natural 
phenomena. Fournier [1982] and others have 
produced realistic terrains using approximations to 
fractal surfaces building on the ideas of Mandelbrot 
[1983]. Reeves [1983] has modelled fire and Gardner 
[1985] has made impressive clouds. Although Reeves 
suggests that his particle systems can be used to model 
water, his paper gives no example. Perlin [1985] has 
published a picture, Ocean Sunset, showing a 
representation of a seascape and this is probably the 
best representation to date. However, it shows only 
one appearance of water and it is not clear how the 
method should be generalised. Water reflections 
appear in Road to Point Reyes [Cook 1983] but they 
are very simple, as is the pool with ripples in 
Erehwon [Weliky 1985]. Nelson Max [1981] made 
some pictures of a pair of islands in the sunset and his 
discussion deals with some of the problems 
mentioned below. It is, unfortunately, difficult to 
assess Max’s ideas from the pictures themselves as he 
was hampered by using a display with only 256 
colours. 

Water is difficult to represent for several reasons. 
Most of the water we see is in motion. Its shape 
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depends on this motion and the motion is very 
complicated. A full, hydrodynamic simulation can, in 
principle, provide us with a complete description of 
the shape of any mass of water, but the computational 
requirements would be huge. 

Even when we know the shape of a mass of water, it 
is still difficult to render because of its optical 
properties. Most of the light falling on it is refracted 
or reflected, but even light which passes through the 
water gets scattered in a more or less complex 
fashion. And the appearance is further complicated 
by the fact that any surface below the water is 
illuminated indirectly by rays focused and scattered 
by refraction at the surface. Water in lakes, ponds 
and puddles presents the simplest surface, a plane 
disturbed by combinations of waves. The wave 
shapes are affected by varying depth and boundary. 
Water flowing in streams and rivers is far more 
complicated. We have initiated a research project 
aimed at studying all aspects of the appearance of 
water. In particular we are experimenting with the 
technique of soft objects [Wyvill 1986] to provide a 
general model for the more complicated cases: 
streams, waterfalls and fountains. 

In this paper, we have confined ourselves to the study 
of pools of water with waves. We have not yet 
attempted animation, and we have avoided actual 
physical simulation. Our main purpose is to discover 
which features matter most when presenting a 
realistic picture of water with waves. Another way of 
looking at it is to ask which features can be omitted 
without making the picture unconvincing. Our 
standard of assessment is thus rather subjective. Still 
photographs of moving water often look very 
different from the original because we never observe 
waves over an area at the same moment. Ideally we 
should be trying to compare our artificial pictures 
with photographs of water in similar circumstances. 
This we have not done. 
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This work has been conducted as part of the 
Graphicsland computer animation project at the 
University of Calgary [Wyvill 1985]. The 
Graphicsland system provides a set of programs 
which are used as tools for creating pictures and 
animations using three dimensional, computer 
models. The pictures we have created, all use a ray 
tracing program which is part of the Graphicsland 
system. This program has been modified, however, 
to support some of our special techniques. 

Relevant Features 

The appearance of water in ponds, pools and puddles 
varies enormously. Sometimes the water is opaque 
(or very nearly so) because of mud or other particles 
and sometimes it is transparent. There is a range of 
colours which are convincing because they do occur 
in nature whereas other combinations evidently do 
not. To investigate all this systematically is a huge 
undertaking but we have made a start by examining 
those features in earlier work which seem less 
satisfactory. 

Wave Shapes Waves on water are complicated 
even in the restricted case of pools. The simplest 
component is a single wave front resulting from some 
disturbance at one point. The resulting wave consists 
of an up-and-down motion of water which spreads in 
all directions from the centre of the disturbance. At 
any point, the motion can be viewed as growing 
quickly in amplitude as the wave front first arrives 
and then slowly decaying. In practice, such a single 
wave is almost never seen. Wind and other forces 
continuously make disturbances at many points on the 
surface and any travelling wave produces secondary, 
reflected waves wherever it meets the edge of the 
water. 

Nelson Max [1981] went to some trouble to find out 
what was a reasonable shape for a water wave profile. 
It seemed to us that the shape produced by just two 
intersecting waves was so complicated that one would 
not expect to detect a 'wrong' shape just by eye. Yet 
Max's sea waves look very wrong in the vicinity of 
his islands. The reason for this is that the waves do 
not show any radical change of appearance as they 
approach the islands. The sloping beach simply 
intersects what looks like a deep-sea wave. In the 
presence of an important optical cue such as the 
beach, we have expectations which must be met if the 
picture is to look convincing. In Ocean Sunset Perlin 
[1985], we see a remarkably convincing picture of 
waves on die ocean. Because there are no other 
objects to give us impressions of scale or expectations 
about the waves, we are easily convinced. Can we 
make adequate pictures of water in a context where 
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we have expectations about the size and behaviour of 
the waves? 

To produce realistic wave shapes on a sloping beach 
is very difficult so we have chosen to model waves in 
a rectangular pool with vertical sides. We felt that it 
was better to use some context (unlike Perlin) but we 
needed to keep it simple enough that we could 
represent the boundary without recourse to detailed 
physical simulation. 

Max [1981] uses combinations of linear wave fronts. 
Perlin [1985] observing that these were too regular, 
combined spherical wave fronts from a collection of 
point sources whose positions were randomised. Our 
best results have also been obtained by adding waves 
from point sources. In some cases we have used a few 
sources placed at random, in others we have tried to 
use some knowledge about the scene to choose the 
source positions. The waves in Figure 5 are generated 
from a single main source together with eight 
'secondary' sources. These secondary sources have 
been so placed (hat they represent reflected images of 
the primary source in the sides of the pool. This 
means that where the waves meet the sides of the 
pool, their shape is consistent with the expectation 
that any wave meets a reflection of itself at the edge. 
Thus the waves of Figure 5 are, in a sense, 
characteristic of the waves in a rectangular 
swimming pool. 


® 

® 
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o 

® 
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® 
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Figure 1 

Wave source O, and images ®, in a pool 


Interestingly enough, there is no decay in our waves. 
They are combinations of sine waves travelling in 
different directions. The vertical displacement at any 
point x,y in Figure 5 is: 

2 w. sin(kr) 

Where r^ is the distance from x,y to one of the wave 
sources and Wj is the amplitude of that source. The 
reason why this works is not clear. But it is a matter 
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of common observation that waves in a swimming 
pool are of pretty uniform size due to the mixing of 
many disturbances. 

Colour Figure 8 shows blue, opaque water. Figure 
9 shows green, partially transparent water. Neither is 
correct. What you regard as acceptable depends on 
your expectations. If the swimmer is in a swimming 
pool. Figure 8 is more realistic. If he is in a canal, 
you will probably prefer Figure 9. In no case have we 
used a very reasonable optical model for the water. 
The colour in water is a function of depth. Light 
entering the water is absorbed and scattered so that 
every part of the water below the surface behaves as a 
secondary light source and the intensity of these 
sources is a decreasing function of depth. Light from 
these sources is further absorbed and scattered, so the 
colour of a body of water is difficult to predict 
theoretically even if the shape is very simple and 
reflection and refraction are ignored. Again, the 
complexity of this acts in our favour in that we do not 
know what to expect. In all our pictures, the colour is 
produced by a simple mixture of light reflected and 
refracted at the surface. No further filtering or 
volume dependent effects are used. 

Modelling technique Max [1981] uses a 
functional approximation to the shape of his waves 
and performs direct ray tracing. The ray 
intersections are found by iteration. Perlin [1985] is 
using a scan line algorithm and solid texture 
mapping. This method is very versatile and enables 
colour and texture to be changed by post processing 
the pixel buffer after the scene has been rendered. 
The method does not, however, support ray tracing 
so no reflections are possible. 

We wanted to compare different approaches so we 
have used three different techniques. The first is to 
construct an approximation to our wavy surface 
using polygons. This is shown in Figure 5. Polygons 
are not very suitable, but the GrapHicsland system 
provides an easy way to create and manipulate such 
models so this provided us with a convenient 
reference surface for no extra programming effort. 

The second technique is to use our wave function as a 
texture map. This basically follows B linn's classical 
method of bump mapping [Blinn 1978] except that we 
are ray tracing. The intersection of each ray with the 
plane water surface is found and then a false surface 
normal is computed for that intersection point. The 
direction of reflected and refracted rays is then found 
using this false normal. This technique is not new. If 
it has been described explicitly we are not aware of it, 
but it has certainly been used in published pictures, 
e.g: Erehwon [Weliky 1985]. Bump mapping works 
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because we cannot perceive the three dimensional 
position of part of a distant surface very accurately. 
We deduce the finer detail of shape from light, shade 
and reflections from the surface. This is another 
reason why we do not think that the shape of the wave 
profile is very important. Indeed we do not know 
what shape our texture-mapped waves are supposed 
to be, only how the normals are perturbed. 

Our third technique is a new application of 
displacement mapping [Cook 1984]. The idea of a 
displacement map in scan-line algorithms is to 
produce local variations of shape by calculating for 
each surface element a displaced position in space. 
Using this method, a block can be represented by just 
a few polygons, and given a curved, sculptured 
appearance as it is rendered. The advantage of the 
displacement map is that it enables us to represent the 
profile where the waves meet the pool side. This 
feature is missing in the texture-mapped examples. 

Experimental Method 

We have set up some standard scenes using the tools 
of Graphicsland and rendered them using a specially 
modified ray tracing program. This has enabled us to 
generate pictures showing a simulated water surface 
in which we can change relevant variables selectively. 
Thus in one picture we make the water opaque, in 
another transparent. We can use different patterns of 
surface wave and experiment with our three different 
rendering techniques. 



The Ray Tracer 

The principal enhancement of our ray tracer is the 
ability to recognise ray intersection with an object to 
which texture mapping applies. Thus the intersection 
with the water plane is given a surface normal which 
has been modified by our wave function. This is 
illustrated in Figure 2. Every point in the plane of the 
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water surface has its normal vector modified by the 
wave function. The modifications calculated for all 
the wave sources are added as vectors to give a 
normal vector which can deviate from the vertical by 
a small amount in any direction. This calculation is 
very fast, but not quite the same as finding the true 
normal of the combined wave. At this stage, we do 
not know how important the difference is, but the 
reflections in Figures 8 and 9 seem good enough. 

To produce a profile edge more cheaply than by 
constructing a genuinely wavy surface, we have also 
experimented with a displacement map. The idea is 
that we calculate an ordinary point of intersection 
with the plane surface which is our water. Then we 
modify the point of intersection, making it nearer or 
further from the eye, along the line of sight, 
according to our wave function. Having got a 
modified point of intersection we then check back 
against adjacent objects in case, after all, the point of 
intersection is now further from the eye than a point 
of intersection with some other object. For example, 
in Figure 3, the intersection of the ray and plane 
surface is in front of the pool's side. But the modified 
intersection is behind. In this case the ray tracer 'sees' 
the pool's side, not the water. The effect is to produce 
an artificial profile edge, Figure 6. 


Side of Pool 



Displaced Intersection 


Figure 3 Displacement Mapping 


Once again, the wave shape calculated by this 
displacement map is different from the original. 
What is worse, the effective shape of the wave 
depends on the angle of view. Our ray tracer uses a 
system of uniform space division to reduce the 
number of ray intersection calculations, and tests for 
adjacent objects are confined to the current volume of 
space division. For these reasons the method is less 
general than we would like, but it does offer a cheap 
way to represent the wave profile. 


Max [1981] reports that in his scenes 10 to 15% o 
rays were reflected twice by the water surface. Our 
texture mapping and displacement mapping 
techniques do not allow for this possibility. We are 
not sure how important this is. Max shows two 
pictures rendered with and without this second 
reflection, but there are other differences in the two 
scenes which make it difficult to be sure how much 
this affects our impression of the water. 

The photographs 

We present a selection of images to illustrate our 
techniques. Figure 4 is an abstract scene using texture 
mapping for the water surface. Figure 5 shows the 
same scene with the polygonal representation. 
Displacement mapping is used in Figure 6 and the 
water colour and reflectivity have been changed in 
Figure 7. 

Figures 8 and 9 show the effect of tuning the 
variables in an attempt at realism. The water in 
Figure 9 is transparent: note the distorted lower jaw. 
Our swimmer has no body beneath the head so this 
picture is a little inconsistent. 

Conclusion 

We have conducted a series of experiments to 
determine how to make convincing pictures of water 
in pools and puddles. We have concentrated on using 
a simplified model of waves on the surface and a 
variety of 'tricks' to achieve reasonably fast 
rendering. 

By careful choice of reflectance, colour and other 
properties, we can make quite convincing pictures of 
water in this way. But there are many more avenues 
to be explored. So far, we have made no attempt to 
simulate the criss cross patterns of light due to 
refraction and focusing of light sources. This is an 
important feature of the appearance of shallow water. 
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Figure 4 Abstract scene 



Figure 6 Displacement map 



Figure 8 Swimmer 



Figure 5 Polygon water 



Figure 7 Effect of colour changes 



Figure 9 Effect of transparency 
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ABSTRACT 

Natural, efficient communication depends upon shared rep¬ 
resentations. Current 3-D graphics systems, however, use rep¬ 
resentations that are quite distant from that which people 
use. The result is that construction of 3-D models is much 
like programming: meticulous translation from the persons’ 
internal representation to the machines’ representation. We 
argue that a constructive solid geometry representation that al¬ 
lows stereotyped deformations and statistical specification closely 
parallels peoples 1 internal representation. Such correspondence 
allows fast, “natural” 3-D modeling; this is especially impor¬ 
tant in the initial stages the design process where a “sketching” 
capability is more important than the ability for precise control of 
details. We describe and evaluate an interactive system that uses 
such a representation. The system demands real-time interaction; 
to support this on 68020-class machines we develop a linear-time 
hidden line algorithm, so that the hidden-line calculation requires 
only slightly more time than is needed to draw the lines. 

1 Sketching versus Detailing 

The distinction between sketching and detailing is impor¬ 
tant in understanding how people create a 3-D model. For in¬ 
stance, engineers typically sketch a new part using paper and pen¬ 
cil, and then give the sketch to a draftsman who uses a CAD sys¬ 
tem to complete the detailed specification of the model. Similarly, 
animators sketch out scenes and actions before drawing careful 
renditions of the sequence. The reason that people standardly 
divide the design process into two stages — each employing its’ 
own media — is that there are two conflicting sets of require¬ 
ments: the initial design of a 3-D model (i.e., 3-D sketching) 
demands the ability for quick, general-purpose, and natural in¬ 
teraction, while the final drafting or rendering stage demands the 
ability for detailed, precise control. 

Most current 3-D graphics systems have the wrong “control 
knobs” for the initial, sketching phase of the design process; that 
is, the things you would like to do when “roughing in” a 3-D model 
aren’t usually easy to do. This makes things difficult; you have 
to approach the task of modelling a shape in a planned, methodi¬ 
cal manner, much as a programmer approaches the problem of 
constructing a program 1 Because you have to carefully plan your 
interaction with the machine, both engineers and graphic art¬ 
ists still sketch shapes on paper before attempting to use a 3-D 
modeling system. 

The use of paper for sketches and computers for final models 
is bad for exactly the same reasons that the use of paper for final 
models is bad: lack of flexibility, unneeded duplication of effort, 
no library of previous drawings, and so forth. In an attempt 
to address these problems we set out to develop a 3-D modeling 
language, user interface, and rendering system that is sufficiently 
“natural” and interactive that people would choose to sketch 
shapes on the computer rather than sketching them out on paper. 
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The idea, then, was to develop a tool that allows the user 
to very quickly build or mod iy a 3-D model; to replace the pen¬ 
cil and paper. A user would directly sketch 3-D form on the 
computer, playing with the shape until it looks right, rather 
than approaching the modeling task as one of entering a care¬ 
fully predefined model into the computer. An engineer would 
quickly “sketch” a new part directly on the computer, playing 
with it until it satisfied him. An animator would “sketch” a scene 
and, Claymation-like, interactively modify the scene so as to step 
through key points in an action sequence. In both cases, once 
we are satisfied with this “sketch model,” we can then invest the 
time to carefully fill out the models’ details using a system that 
is specialized for that particular task. 

We want, therefore, a tool that is not specialized to any 
one application domain but. like pencil and paper, is equally 
applicable to any 3-D modeling task. And further, like pencil 
and paper, we want this modeling tool to be generally available: 
i.e., cheap enough to sit one on everyones 1 desk, so that they will 
actually use it. 

1.1 The Design of a Graphics System 

We have implemented our solution to these problems 
in a system called SuperSketch (named for “sketching” and 
“superquadrics”), which provides an environment for interac¬ 
tively sketching and rendering 3-D models. The specific major 
design criteria for SuperSketch were: 

(1) Representation: The system must have a communica¬ 
tion metaphor (language) that closely matches the way people 
naively think about and discuss shape, to promote easy, natural 
communication between the user and the machine. 

(2) Interaction: The system must have an interaction inter¬ 
face that allows users to attain a level of “effortless” interactive 
control similar to that of an engineer or artist sketching in pencil. 

(3) Efficiency and Accessability: If it is to be truely use¬ 
ful, the system must be efficient enough to allow “real-time” line 
drawings and rapid full color renderings on a computer inexpen¬ 
sive enough to sit on everyones’ desk; e.g., a Motorola 68020-class 
machine without additional hardware. 

In the following sections of this paper we will discuss how 
we have sought to meet each of these design criteria. 

2 Representation 

The process of constructing and animating a 3-D model 
is a process of communication between the machine and the 
human operator. Because communication depends upon hav¬ 
ing a shared representation of the situation, the development of 
natural, “effortless” methods for constructing and animating 3-D 
shapes depends upon having a representation that is isomorphic 
to that which people use. When the representation used by the 
machine doesn’t match the way the human operator thinks of 
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Figure 1 . (a) A chair; naive subjects typically describe this as being 

formed from Boolean combinations of appropriately deformed modeling 
primitives, (b) a sampling of the basic forms allowed, (c) deformations of 
these forms. 

the process, we get what I call the “Etch-A-Sketch problem 2 
the system has the wrong control knobs 

2.1 Man-machine interaction: building 3-D models 

As an illustration of why the way you represent a scene is 
important, imagine that you were looking at a chair such as is 
shown in Figure 1(a), and trying to figure out how to build a 
3-D model of it. When people verbally describe the shape of this 
chair, they typically [1,2] say things like 

“Well, the back of a chair is a sort of squarish, thin thing 
that has been bent slightly. The bottom of the chair is 
the same but thicker, and rotated 90°. The legs are long 
rectangular things stuck into the bottom of the chair, and 

People describe shapes in terms of combining “parts” to 
form prototypes, and in terms of certain standard deformations 
of those parts and prototypes. If the computer understood such 
descriptions, then you could enter the above description of a chair 
directly. You could construct a 3-D model almost as easily as you 
could produce a verbal description. 

Typically, however, the representation the computer uses is 
more like splines or polygons, so to enter the model you must 
adjust spline control points or enter polygons vertices to obtain 
a shape that matches your mental image of the desired form. 
Unfortunately, people do not “see” or (normally) think of objects 
in terms of polygons or splines. Thus the user is forced to care¬ 
fully (and laboriously) translate between his mental concept of 
the shape and the computers’ representation — to “program” in 
the base language that the computer uses. 

Thus we can liken building a 3-D model on most current 
day 3-D graphics systems to programming a computer in machine 
language: you can do anything, but it is often quite laborious. Nor 
will an elaborate human interface help much: such an interface 
is like providing the programmer with an assembly language and 
stepping debugger. Such tools are much better than machine 
language, but as long as the basic representation is unnatural for 
the user they still fall short of providing the advantages of a high 
level language. 

Thus it seems that if we could discover a concrete, math¬ 


ematical version of the “parts” that people use to think about 
3-D shape, we could construct a graphics system that wouldn’t 
require the user to be a programmer: it wouldn’t require him to 
translate from the way he thinks of the problem to the way the 
computer represents the problem. 

2.2 Animation 

Similar problems arise when we turn from the problem of 
building 3-D models to the problem of animating them. Polygonal 
representations, for instance, are too fine grain for ease of 
manipulation; often the path of each polygon must be separately 
controlled to produce natural motion. Similarly, spline repre¬ 
sentations have the problem that non-rigid motions require a very 
difficult-to-compute interpolation of the spline parameters. 

These difficulties arise because the grain size of the repre¬ 
sentations doesn’t match grain size of the problem. Points in the 
world are not, typically, independent of each other — as they ap¬ 
pear in fine-grained polygonal representations — they often move 
in concert, rigidly or elasticly. Larger grain representations such 
as splines or Constructive Solid Geometery (CSG) systems have 
the opposite problem, as they assume the relationship between 
points to be fixed: animators, unfortunately, often want objects 
to move elasticly, and to stretch or compress. 

For animation we need to have a representation that 
matches the grain size of the problem. The disciplines of 
mechanics, dynamics and kinematics provide a suggestion about 
how to represent objects for animation, for they represent objects 
as fixed, solid bodies that undergo translation, rotation and elas¬ 
tic or inelastic deformation. 

To model a blade of grass bending in the wind, for example, 
we would probably first take our polygon or spline description and 
find a simple mathematical model that was “similar”, e.g., a rigid 
rod. We would then compute the deformation caused by the wind 
pushing evenly along the length of the rod, and then finally map 
that deformation back to the polygon or spline representation of 
the actual shape. It is obvious that things would be simpler if our 
original representation for the blade of grass were the same one 
we used for computing the parameters of the bending motion; 
e.g., a single mathematical object, like the rod, that could then 
be deformed and rendered directly. 

As a more complicated example, consider the modeling of 
vibrational modes in the animation of biological forms. Muscles, 
joints and flesh are elastic, and so realistic biological motion must 
include bouncing and elastic deformation as well as translation 
and rotation; perhaps the best illustration of this is found in Walt 
Disneys’ movies, e.g., the dancing dwarfs in “Snow White and the 
Seven Dwarfs.” 

When analyzing the vibrational modes of objects, the stan¬ 
dard proceedure is to break complex shapes into the union of 
simple convex shapes whose compression, extension and bending 
may be separately considered. Thus if we represent our shapes 
as unions of convex forms with later deformations — similar to 
the “parts” that people naturally use to describe shape — we 
will be more easily able to describe, compute, and constrain the 
parameters of motion because they will be relatively simple func¬ 
tions of the description. That is, a part-by-part description will 
provide the right “control knobs” for computing the parameters 
of motion. 

In summary, then, the fact that a “part” description 
is the basis for both peoples’ naive notions of form and for 
mechanics/dynamics/kinematics makes it seem likely that we 
can develop a descriptive vocabulary that will allow us to ac¬ 
curately model the world in terms of parts: a parameterized set 
of volumetric primitives that, in relatively simple combination, 
can be used to form rough-and-ready models of the objects in 
our world and how they behave. If we can develop such part- 
like modelling primitives then not only will animation become 
easier, but the problem of building 3-D models will become easier 
because people seem to think about shape in terms of such part 
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descriptions. The first question to be answered, therefore, is what 
is the notion of “a part” that people use? 

2.3 People: Parts and Collective Abstractions 

A considerable amount is known about how people concep¬ 
tualize 3-D shape. For instance, we have found that the chair ex¬ 
ample above is a general phenominon — i.e., people describe form 
in terms of combinations of component parts, which in turn are 
described as modifications of standard prototypes. This sort of 
structuring of imagery was first explored by the classical Gestalt 
school of perceptual psychology [3,4], and today is the subject 
matter of a lively school of investigating human perception [5,6,7]. 
Indeed, such a part-based, prototype-and-modification descrip¬ 
tive system seems to be common to all human spatial reasoning; 
the classic work by Rosch [8], for instance, supports this view: 
she showed that even primitive New Guinea tribesmen (who ap¬ 
pear to have no concept of regular geometric shapes) form the 
geometric prototypes in much the same manner as people from 
other cultures, and describe novel shapes in terms differences from 
these prototypes. 

Nor is this purely a cognitive phenominon. When images 
are stabilized on the retina, for instance, they seem to disappear 
because low-level mechanisms in the human visual system sup¬ 
press anything that doesn’t move. [This is why you don’t see 
the veins in your retina.] What is interesting is that this dis¬ 
appearance doesn’t occur uniformly, but rather affects things in 
chunks: whole “parts” of objects fade and return, rather than 
line segments, random patches, or whole objects [9,10] 

The central consensus of this research is that people see 
part boundaries as occuring at places of extremal curvature or 
at inflections 3 ; this leads to a characterization of 3-D parts as 
being Boolean combinations (specifically or’s and not’s) of convex 
“blobs” [11,12]. When there are specialized cues that indicate 
that two portions of a figure share a common history — e.g., 
pronouced axes of symmetry, parallelism, etc. — the human 
visual system groups these portions together into a single “part” 

[6.13.14] . Thus we must allow certain stereotyped deformations 
of our convex blobs to still be considered as a single “part.” But 
which deformations? 

We have found that in verbal descriptions of unfamiliar 
imagery (electron microscope images) people commonly employ 
a limited set of deformations: bending, tapering, and twisting 

[1.2.14] . We can also address the question of which deformations 
are allowable by examining the range of image cues that support 
the perception of a “deformed part.” When we do this we find 
that the most important grouping cues — symmetry and paral¬ 
lelism - allow reliable inference of bending and tapering, and 
perhaps of twisting in the case of square-edged or ruled forms 
[6,13]. Thus we will adopt bending, tapering and twisting as our 
sole allowable deformations. 

Complex natural surfaces. Things seem to happen some¬ 
what differently, however, for complex natural forms such as 
clouds or mountains, perhaps because such natural shapes simply 
have too much detail to completely remember, and the details 
are too variable across instances of the same type of object. 
Experiments in human memory [15] suggest that for complex sur¬ 
faces, e.g., a crumpled newpaper, people seem to abstract out a 
few properties such as “crumpledness” and a few major features 
of the shape such as the general outline. The rest of the structure 
is ignored; it is unimportant, random. 

The fractal-like stochastic representations recently devel¬ 
oped in computer graphics mimic this sort of abstraction of 
qualitative properties like “crumpledness” by letting us qualita¬ 
tively describe the morass of details by means of a statistical 
process. 

Interestingly, we have found that the parameters of these 
stochastic processes have a surprising amount of psychological 
reality. We have shown [16,17], for instance, showing that 
peoples’ perception of “roughness” versus “smoothness” varies 
as a linear function of the surface’s fractal scaling parameter 
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[“fractal dimension”]. This result indicates that representations 
that incorporate such stochastic models are a start towards 
duplicating the sort of physically meaningful abstraction of shape 
that people accomplish. 

2.4 A Representational System 

The above considerations lead us to the following repre¬ 
sentational system, a system that we have found competent to 
accurately describe an extensive variety of natural forms (e.g., 
people, mountains, clouds, trees), as well as man-made forms, 
in a succinct and natural manner. The idea behind this rep¬ 
resentational system is to provide a vocabulary of models and 
operations that will allow us to model our world as the relatively 
simple composition of component “parts,” retreating to statistical 
description when the complexity of the scene becomes too large 
for convienient manipulation. 

The most primitive notion in this represention is analogous 
to a “lump of clay,” a modeling primitive that may be deformed 
and shaped, but which is intended to correspond roughly to our 
naive perceptual notion of “a part.” 

For this basic modeling element we use a parameterized 
family of shapes known as a superquadrics [18,19], which are 
described (adopting the notation cost; = sin a; = S^) by the 
following equation: 

Xfo.w) = (S* J 

where x(Vi^) * s a three-dimensional vector that sweeps out 
a surface parameterized in latitude q and longitude a/, with the 
surface’s shape controlled by the parameters t\ and This 
family of functions includes cubes, cylinders, spheres, diamonds 
and pyramidal shapes as well as the round-edged shapes inter¬ 
mediate between these standard shapes. Some of these shapes are 
illustrated in Figure 1(b). Superquadrics are, therefore, a super¬ 
set of the modeling primitives commonly used in CSG systems. 

These basic “lumps of clay” (with various symmetries and 
profiles) are used as prototypes that are then deformed by stretch¬ 
ing, bending, twisting or tapering, and then combined using 
Boolean operations to form new, complex prototypes that may, 
recursively, again be subjected to deformation and Boolean com¬ 
bination. As an example, the chair in Figure 1(a) was constructed 
in much the manner that we have found people describe this 
shape: the back and seats are rounded-edge superquadric “cubes” 
that are flattened along one axis, and then bent somewhat to ac¬ 
commodate the rounded human form, etc. 

The mathematical basis for this portion of the descriptive 
language was originally developed by Barr [20], although he did 
not envision it as the basis of a general purpose modeling lan¬ 
guage. Nonetheless, his work has let us develop a vocabulary 
of form that closely mimics human notions of part structure 
and is considerably more powerful than traditional CSG repre¬ 
sentations. 

To illustrate the flexiblity of this representation, consider 
the range of basic superquadric shapes, as shown in Figure 1(b). 
Already this is a superset of traditional modeling primitives, as it 
includes rounded shapes as well as traditional Platonic solids. By 
allowing the deformations that people employ in verbal descrip¬ 
tions — stretching, bending, tapering and twisting —we greatly 
expand the range of primitives allowed, as shown in Figure 1(c). 

Still, the most powerful notion in this language is that of al¬ 
lowing Boolean combination of the primitives. This intuitively at¬ 
tractive CSG approach — building specific object descriptions by 
applying the logical set operations “or” and “not” to component 
parts — introduces a language-like generative power that allows 
the creation of a tremendous variety of form, as is illustrated by 
the figures in this paper. 
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Figure 3. The SuperSketch viewports. The left viewport is an interactive 
view of the scene with hidden lines removed (the linear-time hidden surface 
algorithm is described in the following section). The right viewport is more 
like a wireframe model, so that objects are not lost to the users’ view. 


The ability to fix particular “lumps” within a given shape 
provides an elegant way to pass from a qualitative model of a 
surface to a quantitative one — or vice versa. We can refine a 
general model of the class “a mountain” to produce a model of a 
particular mountain by fixing the position and size of the largest 
lumps used to build the surface, while still leaving smaller details 
only statistically specified. Or we can take a very specific model 
of a shape, discard the smaller constituent lumps after calculating 
their statistics, and obtain a model that is less detailed than the 
original but which is still appears qualitatively correct. 


Figure 2. (aj - (c) show the construction of a fractal shape by succes¬ 
sive addition of smaller and smaller features with number of features and 
amplitudes described by the ratio 1/r. (d) Spherical shapes 

2.5 Complex inanimate forms 

To show how we may integrate the “part” representation 
discussed above with the textural abstractions needed to describe 
complex forms, let us first investigate a model of 3-D texture 
widely used in the eraohics community: fractal Brownian func¬ 
tions. We randomly place n 2 large bumps on a plane (where 
n is a constant chosen so that the bumps adequately fill out the 
plane), giving the bumps a Gaussian distribution of altitude (with 
variance cr 2 ), as seen in Figure 2(a). We then add to that 4n 2 
bumps of half the size, and altitude variance <r 2 r 2 . as shown in 
Figure 2(b). We continue with 16n 2 bumps of one quarter the size, 
and altitude o 2 r 4 , then 64n 2 bumps one eighth size, and altitude 
< 7 2 r 6 and so forth. The final result, shown in Figure 2(c) is a true 
Brownian fractal shape. The validity of this construction does 
not depend on the particular shape of the superquadric primitives 
employed; the only constraint is that the sum must fill out the 
Fourier domain. 

Different shaped lumps will, however, give different ap¬ 
pearance or texture to the resulting fractal surface; this construc¬ 
tion, therefore, lets us generalize the standard fractal construc¬ 
tive techniques to produce surfaces with varying lacunarity, etc. 
One particularly efficient way to produce such shapes is by con¬ 
volution of appropriately scaled kernels over arrays filled with 
random noise 4 . 

When the placement and size of these superquadric lumps 
is random, we obtain the classical Brownian fractal surface that 
has been the subject of much previous research. When the 
larger components of this sum are matched to a particular object, 
however, we obtain a description of that object that is exact to 
the level of detail encompassed by the specified components. 

This makes it possible to specify a global shape while retain¬ 
ing a qualitative, statistical description at smaller scales: to 
describe a complex natural form such as a cloud or mountain, we 
specify the “lumps” down to the desired level of detail by fixing 
the larger elements of this sum, and then we specify only the 
fractal statistics of the smaller lumps thus fixing the qualitative 
appearance of the surface. Figure 2(d) illustrates an example of 
such description. The overall shape is that of a sphere; to this 
specified large-scale shape, smaller lumps were added randomly. 
The smaller lumps were added with six different choices of r (i.e., 
six different choices of fractal statistics) resulting in six qualita¬ 
tively different surfaces — each with the same basic spherical 
shape. 


3 Interaction 

The first design criterion of our system is a representation 
(metaphor) that is natural to the human user. The pre¬ 
vious section described the metaphor used in this system: that 
of building objects from clay, a descriptive strategy people 
often spontaneously use and which they find natural, using our 
superquadric-based analogy to the human perceptual notion of 
“parts.” Thus the system presents the user with “lumps” of 
pliable material (like clay) that may then be formed by chang¬ 
ing the parameters of the part-like primitives (e.g., modifying 
the squareness-roundness, length, amount of bending, etc.), and 
finally combined with other parts of the scene using boolean 
operations (e.g., “or” and “not”). 

The second design criterion of our system is that it have a 
user interface that allows users to attain a level of “effortless” in¬ 
teractive control similar to that of an engineer or artist sketching 
in pencil. To provide accurate, complete real-time interactive 
feedback of the state of the 3-D model under construction, we 
decided to employ two engineering-style orthographic views (x-y 
and y-z) of line drawing sketches of the scene. This is shown in 
Figure 3. All hidden lines are removed in the x-y view (labeled 
“sketch of scene”), but in the “y-z view” only external facing 
surfaces are rendered, i.e., objects are seen as “transparent” 
wireframes, with only back-facing or intersecting portions of the 
wireframe removed. This partial hidden-surface presentation 
prevents objects from being lost to the users’ view. Objects can 
be moved, deformed, etc., and redisplayed using a two-hundred 
triangle line-drawing approximation to the underlying analytical 
form in about one-eighth of a second, thus providing the percep¬ 
tion of smooth, “real-time” motion and deformation. 

4 Efficiency and Accessability 

Central to the user’s impression of interactivity and 
“naturalness” is the real-time display of the current state of the 
3-D model. Unfortunately, this requirement is in direct conflict 
with the criterion that our system run on Motorola 68020-class 
machines. 

Polygon-based algorithms are fundamentally order wlogri 
in the number of polygons, and z-buffer techniques, although 
linear in the number of polygons, are also linear in the number 
of pixels. Further, as we require the ability to perform Boolean 
combinations of our part primitives, we must also add in time 
for conversion to a standard polygon representation, which is 
typically order n 2 . Thus achieving real-time display on these 
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machines seems impossible with current algorithms, because their 
fundamental computational complexity. 

We have therefore developed a hidden line algorithm that is 
linear in the number of polygons being modified. It is, as far as we 
have been able to determine, the only example of an incremental, 
linear-time algorithm other than z-buffer algorithms 6 . This algo¬ 
rithm may be viewed as an analytical version of ray casting [22]. 

5 Human Interaction Performace 

We have set out to build a system that permits a user to 
quickly sketch a very wide range of form. How well have we really 
done? There are two ways to answer this question: One, have 
we developed a representation/metaphor that supports natural 
man-machine interaction?, and two, have constructed a system 
that permits quick, responsive modeling of form?. Although we 
have not yet done the sort of careful psychophysical testing that 
motivated our development of the representation, we can give a 
subjective evaluation and a few quantiative benchmarks; these 
are reported below. 

A natural vocabulary?. We have found that, as a rule, when 
we try to model a particular 3-D form using this system we 
naturally tend to describe the shape in a manner that cor¬ 
responds to the organization our perceptual apparatus imposes 
upon the image, even to making the distinctions standardly made 
in English. That is, the components of the description match 
one-to-one with our naive perceptual notion of the “parts” in the 
figure. 

For instance, Figure 4 shows how the face is formed from 
the Boolean sum of several different primitives. The basic form 
for the head is a slightly tapered ellipsoid. To this basic form is 
added a somewhat cubical nose, bent pancake-like primitives for 
ears, bent thin ellipsoids for lips, and almond-shaped eyes, as is 
shown in Figure 4(a). Figure 4(b) show the addition of rounded 
cheeks and a slightly pointed chin (is this Yoda from Star Wars?), 
and finally Figure 4(c) shows the addition of a squarish forehead 
and slightly fractalized hair. 

The smoothly shaded result is shown in Figure 4(d) — it is 
a reasonably accurate human head, composed of only 19 primi¬ 
tives, specified by slightly less than 130 bytes of information. The 
two scenes shown in Figure 5 are described in a similarly con¬ 
cise, natural fashion. Figure 5(a) contains only 56 primitives, or 
about 500 parameters/bytes of information. Figure 5(b) contains 
only 100 primitives (about 1000 parameters/bytes of informa¬ 
tion) despite the considerable detailing in the faces (see Figure 4). 
One should remember that this representation is not in any way 
tailored for describing the human form: it is a general-purpose 
vocabulary. 

The extreme brevity of these descriptions is evidence of 
their “naturalness.” We also note that this brevity makes many 
otherwise difficult tasks relatively simple, e.g., even NP-complete 
problems can be easily solved when the size of the problem is 
small enough. For instance, in animation one would like to be able 
to specify constraints like “x does not intersect y,” “x attached 
to y,” or even “x supports y.” When even complex scenes can 
be described by relatively few “parts” the problem of satisfying 
constraints can be made tractable. 

A quick, responsive system?. The correspondence between 
the organization of descriptions made in this representation and 
human perceptual organization neans that it is easy to “see” how 
to assemble a 3-D model. It also means that we try to modify 
or animate an existing model we will likely find that the changes 
we have to make are a simple function of the parameters of our 
model, rather than being, e.g., some hard-to-compute property 
of a collection of polygons or splines. 

Because this part-based representation seems to have the 
right “control knobs” for manipulating 3-D models, it provides 
the basis for surprisingly effortless interaction: it took a 
moderately skilled operator less than a half-hour to assemble the 
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Figure 4. Building a face. 


face in Figure 4, about five minutes to create the chair in Figure 
1, and less than four hours each (including coffee breaks) to make 
the images in Figure 5. Much of this speed is due to the brevity 
of the final descriptions: to build the scene in Figure 5(a), for 
instance, requires positioning the mouse only 500 times 

This performance is in rather stark contrast to more tradi¬ 
tional 3-D modeling systems that might require several days to 
build up a complex scenes such as shown in Figure 5. This 
performance, perhaps more than any other statistic that could 
be given, illustrates how the close match between this repre¬ 
sentational system and the perceptual organization employed by 
human operators facilitates effective man-machine communica¬ 
tion. 

6 Summary 

Man-machine interaction requires a representation that cor¬ 
rectly describes the perceptual organization people impose on the 
stimulus. We have, therefore, presented a representation that 
has proven competent to accurately describe an extensive variety 
of natural forms (e.g., people, mountains, clouds, trees), as well 
as man-made forms, in a succinct and natural manner. The ap¬ 
proach taken in this representational system is to describe scene 
structure in a manner that is like our naive perceptual notion of 
“a part,” and to allow qualitative description of complex surfaces 
by means of physically- and psychologically-meaninful statistical 
abstractions. 

To implement this system we have devised a user interface 
that allows the user to assemble forms in a natural manner, 
without having to be conscous of the details of either computer 
or program, and without having to move his hands unnessarily. 
This interface requires real-time feedback; to support this we have 
devised a linear-time hidden line algorithm that allows real-time 
display of two engineering views of the scene on a 68020-class 
machine without need for special hardware. 

Each of the component parts of this representation — su¬ 
perquadric “lumps,” deformations, Boolean combination, and the 
recursive fractal construction — have been previously suggested 
as elements of various shape descriptions, usually for other pur¬ 
poses. The contribution of this paper is to bring all of these 
separate descriptive elements together as a theory of human per¬ 
ceptual organization, and use them as the basis for man-machine 
interaction. In particular, we believe that the following are the 
important contributions this paper make toward solving the prob¬ 
lems building and animating 3-D forms: 
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• We have demonstrated that this representational system 

is able to accurately describe a very wide range of natural 
and man-made forms in an extremely simple, and there¬ 
fore useful, manner. 

• We have found that descriptions couched in this repre¬ 
sentation are similar to people’s (naive) verbal descrip¬ 
tions and appear to match people’s (naive) perceptual 
notion of “a part.” 

• We have found that by using the fractal construction with 

various primitive elements and fractal scaling parameters 
we can mimic the sort of physically-meaninful statistical 
abstraction that people seem to employ when describing 
the shape of complex surfaces. 

• And finally, we have shown that descriptions framed in the 

representation have markedly facilitated man-machine 
communication about both natural and man-made 3-D 
structures. It appears, therefore, that this representation 
gives us the right “control knobs” for discussing and 
manipulating 3-D forms. 

Finally, however, we believe that the representational 
framework presented here is not complete. It seems clear that 
additional modeling primitives, such as branching structures [24] 
or particle systems [25], will be required to model the way people 
think about objects such as trees, hair, fire, or river rapids. 
Our future work will involve the integration of these primitives, 
together with time and motion primitives, into the framework 
that we have presented here. 
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ABSTRACT 

An Artificial Visual System (AVS) has 
been developed to simplify 

three-dimensional microscope images for 
presentation and manipulation in an 
interactive computer graphics system. The 

AVS consists of several sets of spatial 
filters that decompose an image along three 

different measurement continua. A 

recombination algorithm processes the 

filter outputs to detect objects, to 
eliminate noise, and to map the detected 
objects into points in a multidimensional 
feature space. Recent discoveries 

regarding the geometry of the points in the 
feature space are described. One recent 
result simplifies the AVS by decreasing the 
number of filters required to obtain the 
same measurements. Not only are accurate 
measurements possible, but certain image 
distortions can be modelled and 
counteracted in the feature space. 

Key words: computer vision, pattern 
recognition, interactive computer graphics 


Introduction 

Difficulties in the analysis of 
natural images arise from random noise, 
aliasing from the digitization grid, 
systematic distortions such as optical 
blur, and from an excess of information — 
accurate but irrelevant data, relevant but 
ambiguous data, or simply too much relevant 
data — that we call "information 
overload". Many techniques exist for 
correcting noise and distortion [1], but 
the information overload problem requires 
an understanding of the aspects of the 
image that are important for the particular 
application and techniques for specifying 
and extracting the relevant aspects of the 
image. Once the information is extracted, 
then interactive computer graphics can be 
applied to present the extracted 
information and to provide powerful 
interactive facilities to support image 
interpretation activity [2]. 

We have developed an interface between 
image processing and computer graphics 
systems that both provides a mechanism for 
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specifying the salient aspects of the image 
and extracts that information in a form 
that can be used to build an interactive 
graphics model of an image. The interface 
takes the form of an artificial visual 
system (AVS). 

The development of the AVS as a means 
for addressing the information overload 
problem has been motivated by a biomedical 
research problem involving interpretation 
of three-dimensional fluorescence images. 
In this paper we will present an overview 
of the biomedical research problem and then 
show how the AVS for this problem was 
designed. Additional applications of AVS 
techniques will be discussed. 


Background 

Our goal is to elucidate the 
contractile mechanism of smooth muscle 
cells [3,4]. One of the proteins believed 
to play a role in that mechanism is 
a-actinin, which occurs in two types of 
discrete bodies of concentration: 
irregular plaques on the cell membrane, and 
oblong bodies distributed throughout the 
cytoplasm and oriented within 30 degrees of 
the long axis of the cell. Organizational 
patterns such as strands of these bodies 


branching 

and 

twisting through 

three 

dimensions 

may 

be 

discerned if 

the 

locations 

of 

the 

bodies and 

the 

orientations 

of 

the 

oblong bodies 

are 


known. Different kinds of organizational 
patterns could support different hypotheses 
of cell structure and function. 

A three-dimensional image of the 
protein distribution in a single, isolated 
cell is obtained by acquiring a series of 
2-D optical sections of the cell using 
Fluorescence Digital Imaging Microscopy 
[3,5]. Several types of noise are 
minimized using averaging and normalizing 
operations during image aquisition [3]. 

There remains a serious optical 
distortion in the direction of focus 
arising from fluorescence sources from 
out-of-focus planes above and below the 
focal plane. This distortion has been 
empirically modelled by imaging a 
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fluorescent bead smaller than a voxel. The 
image obtained serves as an empirical 
estimate of the point spread function of 
the overall optical system and is used in a 
constrained iterative restoration procedure 
to reduce the distortion in the cell images 
[3]. The restoration reverses about 80% of 
the optical distortion, but significant 
distortion remains in the direction of 
focus. This residual distortion elongates 
the image of a spherical object in the 
direction of focus yielding an apparent 
axial ratio of about 3:1. 

The restored image is still difficult 
to analyze due to information overload. A 
64x64x64 cell image can contain over 200 
discrete concentration bodies. The oblong 
bodies are about 1 voxel wide and about 5 
voxels long, but their oblique orientation 
and the residual distortion spread the 
image of each body over a larger volume. 
Locating the bodies manually is a tedious 
task requiring constant flipping between 
adjacent image planes, correlating traces 
of the bodies through the planes. 
Estimating the orientations of the bodies 
from these traces is even more difficult. 
In addition to the residual optical 
distortion, the small apparent size of the 
bodies makes aliasing from the digitizing 
grid a serious concern; the difference in 
the digitized image between two small 
bodies at nearby orientations involves a 
subtle shift of energy among a few voxels. 

The three-dimensional nature of the 
data adds more information overload. Since 
the structures we seek twist through three 
dimensions, no single two-dimensional view 
can capture all of the relevant information 
about the structures. 

We need, then, a system to simplify 
the restored three-dimensional image by 
locating the protein bodies and determining 
the orientation of the oblong bodies. The 
system we have developed is called a 
three-dimensional artificial visual system. 


Designing an Artificial Visual System 

An Artificial Visual System (AVS) is a 
set of filters along some equivalence 
dimension (e.g. spatial frequency, size, 
orientation) and a recombination algorithm 
for mapping the filter responses into a 
perceptual feature space [4,6]. Defining 
an AVS involves selecting approprate 
equivalence dimensions, designing filters 
sensitive to different values along those 
dimensions, and defining the recombination 
algorithm to perform the visual task at 
hand. 

The equivalence dimension is a 
continuum along which measurements can be 
made. Objects in the image will be treated 
as stimuli to be represented or measured 
along these continua. Complex structures 
may require measurements on several 
equivalence dimensions to adequately 
characterize the structure of the stimulus. 
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A sequence of filters is defined for 
each equivalence dimension. Each filter is 
sensitive to a different range of values 
along the continuum. Normally, the 
sensitivity profiles of the sequence of 
filters on their equivalence dimension are 
designed to be tapered and overlapping. 
Such an ensemble of filters provides a 
unique sequence of responses for every 
single-valued stimulus on the equivalence 
dimension [4,7]. The filter responses 
serve as coordinates of the stimuli in a 
multidimensional feature space [8]. 
Measurements of the stimuli are based on 
the geometry of the mapping of stimuli into 
this feature space. 

The purposes of the filters are as 
follows: (1) to decompose the image into 
separate bands of information so that 
important and useful information can be 
identified more easily? (2) to represent 
the a priori knowledge concerning the 
objects being sought and the precision of 
measurement required and (3) to define a 
feature space into which the image will be 
mapped [6]. Use of a priori knowledge in 
the filter design enhances both the 
sensitivity and the efficiency of the 
analysis. Sensitivity is enhanced because 
the filters can be tuned to detect the 
structures of interest. The filters can 
also embody the degree of uncertainty in 

the a priori knowledge, as demonstrated in 
the present study where the orientations 
and sizes of the oblong dense bodies are 
known a priori only to an approximation. 
Efficiency is enhanced because the a priori 
knowledge decreases the number of filters 
required for the task? a more generalized 
analysis or higher measurement precision 
requires a finer or more complete 
decomposition of the image, requiring more 
filters. The AVS allows a priori 
information to be incorporated in the 
design of the filters rather than in the 
design of new problem-specific heuristic 
algorithms. 

The recombination algorithm merges the 
information from each set of filters to 
perform the visual task required. The 
algorithm may involve thresholding to 
eliminate noise, averaging to compute a 
measurement, location of relative extrema 
to create a representation or detect a 
particular kind of stimulus. Other 
recombination algorithms provide edge 
detection or texture representations [6,9]. 
If the objective of the recombination 
algorithm is to create a representation of 
the stimulus to be operated upon later by 
other processes, then the AVS serves as the 
"low-level vision" component of the vision 
system. If the objective of the 
recombination algorithm is to produce the 
required measurement, then the task is 
called a "pre-attentive" operation. 


The AVS for the Cell Study 

For the smooth muscle study, three 
equivalence dimensions are defined: 
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length, declination, and azimuth. The 
filters for each dimension are constructed 
in the spatial domain and convolved with 
the 3-D image using Fourier Transform 
methods. The filtering operation produces 
three series of filtered images. 
Fortunately, we can arrange the 
computations so that the filtered images 
can be created, processed, and discarded, 
so it is not necessary to store the entire 
ensemble of 3-D filtered images at once. 

The filters along the length dimension 
are used to locate the bodies, 
differentiate them from noise phenomena in 
the image, and to measure the lengths of 
the bodies. * The filters are shaped 
(approximately) as truncated cones of 
length R=3, 5, 7, and 9 voxels [4). See 
Appendix 2 for details of filter creation. 
The width of the cone is determined by a 
priori information concerning the expected 
variability in the orientations of the 
oblong bodies. 

When convolved with the fluorescence 
image of the cell, a local relative maximum 
response occurs when the filter contains a 
locally maximum fluorescence intensity. 
These local maxima include the centers of 
all of the concentration bodies due to the 
symmetry both of the filters and of the 
bodies, along with maxima caused by noise 
or fluorescence hot spots unrelated to the 
dense bodies we seek. 

The recombination algorithm for 
processing the output of the R series 
filters involves two steps. First, the 
local maximum responses are located and 
thresholded to eliminate miniscule 
fluorescence hot spots and random noise. 
Second, if a real concentration body has 
been found, then the energy captured in the 
sequence of R filters increases until the 
filter size exceeds the size of the body, 
at which point the filter response remains 
constant. Thus, the length measurement 
works as follows: If, at a particular 
location in the image, the smallest filter 
(R=3) has an superthreshold maximum and the 
R=5 filter response at the same location is 
also a superthreshold maximum that is 
significantly (1.2 times) greater than the 
R=3 response, then a valid body (length at 
least 4) has been found. The estimated 
length will be increased to the size of the 
next larger R filter as long as the next R 
filter has a superthreshold maximum at the 
same location with an intensity greater 
(1.2 times) than the response of the 
current R filter. 

The orientations of the oblong bodies 
are measured in terms of the declination 
(0<=theta<=90) of the long axis of the body 
from the y axis of the 3-D image and the 

azimuth (0<®phi<360) about the y axis using 
the x axis as phi=0. (The 3-D images are 
acquired with the long axis of the cell 
oriented vertically in the microscope 
image, corresponding to the y axis of the 
3-D image.) 
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The filters for the theta and phi 
equivalence dimensions are constructed by 
partitioning the R filters into "hollow 
cones" for theta measurements or "wedges" 
for phi measurements [4]. Theta filters ' 
are centered at theta=0, 5, 10, 15, 20, 25, 
and 30 degrees. Phi filters are centered 
at phi=»0, 60, 120, 180, 240, and 300 

degrees. See Appendix 2 for details of the 
creation of the phi filters. The R filters 
at R“5, 7, and 9 are thus partitioned into 
theta and phi filters giving a total of 40 
filters covering all combinations of values 
in all three equivalence dimensions. 

The 36 theta and phi filters are 
convolved with the input image. For each 
body identified by the R filters, we record 
the responses at the body center of the 
twelve theta and phi filters corresponding 
to the size of the body. We use only the 
theta and phi filters best matching the 
body size to avoid incorporating responses 
to nearby structures or noise into the 
orientation measurements. (There is some 
evidence that responses to fluorescence 
signals outside the body can be eliminated 
mathematically without using separate 
filter sequences for each possible length. 

We plan to investigate this possibility 
later.) These filter responses form two 
six-dimensional vectors that characterize 
the theta and phi orientations of the 
bodies. The feature vectors are normalized 
by the responses to the R filter matching 
the body's size so that the sum of the 
values in each feature vector is 1. 

The oblong bodies have a single 
preferred orientation (by virtue of their 
oblong shape), so the sequence of responses 
to overlapping, tapered filters defined 
along the theta and phi equivalence 
dimensions is unique for every possible 
orientation. In fact, the normalized 
response to each filter may be used as a 
weighting factor indicating the degree to 
which the body's spatial energy 

distribution matches the filter's preferred 
orientation. The recombination algorithm 
for constructing an estimate of the 
orientation of the body involves 
(circularly) averaging the filter center 
orientations weighted by the responses of 
the filters. This is equivalent to a sum 
of vectors whose polar representation (r, 
alpha) has the r component equal to the 
normalized filter response for the filter 
centered at phi*alpha. 


Application and Evaluation of the AVS 

The performance of the AVS has been 
extensively studied using artificial images 
containing model cylindrical bodies at 
regularly spaced orientations. Model 
bodies are created at theta angles 0, 5, 
10, 15, 20, 25, and 30. For each theta 
angle (except theta=0) bodies are created 
with phi angles 0, 30, 60, 90, 120, 150, 
180, 210, 240, 270, 300, and 330 degrees. 
These images have been analyzed themselves, 
and they have been distorted using the 
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empirically determined point spread 
function of the microscope system and 
partially restored using the iterative 
restoration algorithm before further 
analysis. 

The objective of this study is to 
determine from the regular sampling of body 
angles whether the mapping of the bodies 
into the feature spaces displays similar 
regularity. If so, then measurements of 
body orientations may be reliably performed 
from the feature space. 

With the noiseless images, the R 
filters correctly locate the bodies, and 
application of the circular weighted 
averaging procedure on the sequences of 
theta and phi filter responses obtains 
orientation measurements correct within 2 
degrees in theta and 5 degrees in phi. The 
errors remaining are due to aliasing 
effects and roundoff errors. 

With the blurred/restored images, the 
interpretation of the filtered images is 
far less straightforward. The residual 
z-axis distortion, which elongates the 
images in the phi=*90 and phi=270 

directions, ruins the theta filter 
measurements. The problem appears to be 
that the z-axis distortion causes theta 
measurements of bodies at a fixed theta 
orientation to vary through a wide range of 
values as phi varies. Thus, bodies at 
different theta orientations are 
indistinguishable unless the phi angle is 
already known. We could compute the phi 
angle first and establish for each phi 
angle appropriate thresholds for 
interpreting the theta data, but a more 
elegant approach has been found. 

We have discovered that both the theta 
and phi measurements can be reliably 
obtained from the phi filter data alone. 
This simplification is possible because as 
the theta angle of a body increases, an 
increasing proportion of the volume of the 
body moves away from the theta=0 axis, 
resulting in increased energy in a 
preferred phi direction. Therefore, we can 
measure phi by computing the circular 
weighted average of the phi filter 
responses, and we can compute theta by 
measuring the intensity of the phi angle 
preference. 

When this approach is applied to 
noiseless model images, the pattern of 
points in the feature space is a series of 
concentric circles. Each circle 
corresponds to a particular theta value, 
and the polar angle alpha of each point 
along the circle is precisely the phi angle 
of the corresponding body. 

When the same approach is applied to 
blurred/restored bodies, the pattern of 
responses in the feature space is a series 
of concentric ellipses. The minor axes are 
in the phi=90 and phi=270 directions, 
corresponding to the direction of the z 
axis distortion in the images. The effect 
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of the distortion on the feature space, 
then, is to decrease the sensitivity of the 
filters in the direction of the distortion. 
We can correct this problem in the feature 
space by converting the polar (r,alpha) 
coordinates to Cartesian form and boosing 
the y component of the Cartesian vectors. 
The scaling factor was determined by 
measuring the eccentricities of the 
ellipses in the feature space and for our 

current data is about 3.2, which 
corresponds to the axial ratio induced by 
the residual distortion in the image of a 
sphere. This scaling factor must be 
recomputed only if the imaging system 
changes. 

Evaluation of the approach is carried 
out by measuring the angle between the 
actual and estimated body orientation 
vectors. That is, the (theta,phi) values 
of a set of model bodies and the 
corresponding estimates are converted to 
Cartesian form and the angle between the 
two vectors was computed. The average 
error angle on our model images is less 
than 2 degrees. Additional tests are in 
progress on noisy model images. 
Preliminary results indicate that in the 
presence of noise, the errors in the 
orientation measurements increase gradually 
as the signal/noise ratio decreases. 
Further work on interpreting these results 
is in progress. 


Interfacing with a Graphics System 

The information extracted by the AVS 
consists of a list of (x, y, z) locations 
where a dense body was found along with (r, 
theta, phi) measurements on each of the 
oblong dense bodies. This position and 
orientation data has been used to create a 
graphic model of the 3-D distribution of 
dense bodies. The bodies are represented 
as lozenge-shaped solid objects having the 
measured length and orientations. The cell 
image is created by projecting prototype 
bodies into space and then subjecting the 
projected model to the required viewing 
transformations [2]. 

The user of the graphics system can 
specify any view position, including 
positions inside the graphic model that 
correspond to positions inside a cell. The 
three-dimensional distribution of the dense 
bodies can be explored by marking (in 
color) particular dense bodies in order to 
trace a network or follow a strand of 
bodies through the cell [4]. 

Interaction is provided by a 3-D 
wire-frame arrow cursor whose movement is 
controlled by a three-dimensional joystick. 
Joystick movements can be interpreted as 
translation or rotation commands depending 
on a switch setting. 

At present, a single view from a 
single viewpoint is presented by the 
graphics system [4). We are considering 
implementation of a dual viewport system 
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that could enable presentation of the view 
as a stereo pair or as a proximal/distal 
view pair. 


Discussion 

The artificial visual system has 
proven to be a powerful tool for image 
analysis due to the following properties 
[4,6]: 

1. A priori knowledge can be 
effectively incorporated into the design of 
the filters and the recombination 
algorithm. The artificial visual system 
can be tuned and experiments can be 
performed by changing only the filter data 
and not the analysis algorithms. This 
enables rapid prototyping and optimization 
of the artificial visual system with a 
minimum of reprogramming and algorithm 
development. In addition, since a few 
simple algorithms suffice for much of the 
processing requirements, special devices 
such as array processors can be brought to 
bear to enhance the speed of execution. 

2. Spatial filtering is intuitively 
understandable. Filters can be defined 
either in the spatial domain or in the 
frequency domain. Either way, the filters 
and their effects on images can be 
determined and understood easily since the 
filter is applied uniformly over the image. 
Understandability is especially important 
in applications since decisions will be 
based on the results of the computer 
procedures and those decisions must be 
defended based on an understanding of the 
computer's results. 

3. Fast algorithms exist for 
performing spatial filtering. Spatial 
domain convolution, spatial frequency 
domain multiplication, recursive filtering, 
and in-place filtering are all well-known 
algorithms for performing spatial 
filtering. Depending on the hardware 
support and the nature of the filtering to 
be performed, any of these equivalent 
algorithms can be chosen. These algorithms 
are amenable to parallel processing to 
enhance execution speeds. 


requires a finer decomposition of the 
relevant continuum involving a larger 
number of more narrowly-defined filters. 
Thus, the tradeoff between cost and 
measurement accuracy is explicit and 
measurable. 


Conclusion 

An Artificial Visual System has been 
developed to simplify three-dimensional 
fluorescence microscopy images. The AVS 
locates bodies of interest in the 3-D 
image, discriminates the bodies from noise, 
and measures the 3-D orientation of each 
body. The measurements are made by using 
the outputs of a series of spatial filters 
to map each body into a point in an 
abstract feature space. The geometry of 
the mapping allows the orientation angles 
to be computed directly from the mapping. 

Moreover, distortions in the 3-D image 
due to the image acquisition system that 
were not corrected by noise reduction or 
image restoration algorithms appear as 
systematic distortions of the geometry of 
the feature space. This residual 
distortion can be measured and corrected in 
the feature space, enabling accurate 
measurements in spite of the imperfections 
in the image data. 

The measurements are then used to 
create a simplified graphical image that 
can be viewed and manipulated using 
interactive graphics tools. Viewpoints 
corresponding to locations inside a cell 
may be constructed. The graphics system 
user can interact with the simplified image 
to record organizational patterns that may 
explain the operation of the contractile 
machinery of the cell. 


Appendix 1: Coordinate Conversions 

This appendix gives the algorithms for 
converting Cartesian vectors to (theta,phi) 
orientation vectors and vice versa where 
the (0,0) direction is the y axis and the 
(90,0) direction is the x axis. 


4. The most important property of 
spatial filtering is that a suitably 
constructed ensemble of filters can be used 
to decompose an image along any of several 
continua (e.g. size, orientation, spatial 
frequency, shape, etc.). Thus, the 
ensemble of filters in a visual system can 
be constructed so as to define a meaningful 
feature space. 

Moreover, a stimulus can be located 
along a continuum (orientation, size, 
spatial frequency) by an ensemble of 
tapered, overlapping filters. With 
suitably defined filters, every possible 
stimulus along the continuum yields a 
unique pattern of responses from the 
ensemble of filters, and thus a unique 
location in the feature space. Increased 
accuracy in the measurements of stimuli 


Convert a (theta,phi) orientation to a 
3-D Cartesian unit vector [x,y,z] as 
follows: 


x = sin(theta)*cos(phi) 
y * cos(theta) 
z * sin(theta)*sin(phi) 

The following algorithm converts a 3-D 
Cartesian vector [x,y,z] to a (theta,phi) 
orientation vector: 


Let rl 
theta 


phi 


sqrt(y~2+z*2) 

arctan(rl/y) if y is not zero 
0 if y«0 and rl=0 

90 if y=0 and rl>= 0 

arctan(z/x) if x is not 0 
0 if x=0 and z=0 

90 if x=0 and z>®0 
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Appendix 2: Filter Definitions 

Filters defined as geometric cones and 
segments of cones [4] were found to be 
subject to errors such as false positives 
or mislocated maxima. Superior results 
have been obtained with filters created by 
integrating images of a cylinder rotated 
about its center. The resulting filters 
have higher sensitivity where many of the 
rotated cylinder images overlap (i.e. at 
the filter center). 

Each filter is formed by a weighted 
sum of images of a rotated cylinder 9 
voxels long and 1 voxel in diameter 
corresponding to the longest apparent size 
of the dense bodies in our images. The 
weighting factor applied to each cylinder 
is based on the difference between the 
cylinder orientation (t,p) and the filter's 
preferred orientation. We define two 
utility functions as follows: 

WT(t,p;tmax) >=1 if t<»tmax 

■ max(0,l-[(t-tmax)/10]) 
if t>tmax 

WP(t,p;pcen) » max(0,l-[|pcen-p|/60]) 

The WT function assigns a weight of 1 to 
cylinders whose theta orientation is less 
than or equal to tmax and attenuates 
cylinders with larger theta values with the 
weight decreasing linearly with (t-tmax) 
and reaching 0 at a theta orientation of 
tmax+10 degrees. The WP function 
attenuates the cylinder images linearly as 
the phi orientation differs from pcen with 
the weight reaching zero 60 degrees away 
from pcen. Note that the difference 
(pcen-p) must be computed mod 360. Now we 
use the utility functions to define the 
cylinder weighting functions for an R 
filter (a cone) and filters P1-P6 
(wedge-shaped phi filters): 

R(t,p) * WT(t, p; 3 0 ) 

Pl(t f p) - WT(t, p; 3 0 )*WP(t, p; 0 ) 

P2(t,p) - WT(t,p;30)*WP(t,p;60) 

P3(t,p) - WT(t, p; 3 0 )*WP(t,p;120) 

P4(t,p) * WT(t, p; 3 0 )*WP(t, p ; 18 0 ) 

P5(t,p) = WT(t, p; 3 0 )*WP(t, p; 2 4 0 ) 

P6(t,p) = WT(t,p;3 0)*WP(t, p; 3 0 0 ) 

The R filter is equal to the sum of the Pi 
filters, making the response of the R 

filter a reasonable normalization factor 
for the sequence of Pi responses. This 
normalization eliminates the effects of 
different overall intensity in different 
bodies. 

Shorter filters are created by 
multiplying the above filters by a sphere 
of an appropriate radius. R and Pi filters 
have been created at lengths 3, 5, 7, as 
well as 9. When the size of a body is 
determined, the P series filters 
corresponding to that size are used to 
estimate the theta and phi angles. 
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Abstract 1 

Using the modified linear quadtree proposed in 
[1,9], this paper presents an 0(n-N) algorithm for 
labeling connected components of a region consisting 
of N BLACK nodes in a 2 n by 2 n binary image. As a 
direct application of the algorithm, a method for 
computing the perimeter of a region is also described. 

1. INTRODUCTION 

The identification of all connected components of a 
region is a fundamental operation in image processing 
and geographic systems [4, 5]. Samet [6] presents an 
algorithm for labeling all connected components of a 
region represented by a quadtree, and shows that its 
average execution time is 0 {T + N-log N), where T 
and N are the total number of nodes and the number of 
BLACK nodes in the quadtree, respectively. That 
algorithm outperforms the traditional method which 
has an execution time proportional to the number of 
pixels of the image [5]. Gargantini [3] also describes 
an entirely different algorithm using a linear quadtree 
[2]; however, that algorithm has limited power as it is 
only applicable to regions with very special 
configurations. 

In this paper, an algorithm adopting a novel 
approach for labeling all connected components of a 
region using a Modified Linear Quadtree (MLQ) is 
presented. It is capable of handling regions with 
arbitrary configurations. Furthermore, the algorithm 
is of time complexity O(n-N), and hence compares 
favorably to Samet's algorithm [6]. As an application 
of the algorithm, this paper will show that, with the 
same time complexity, die perimeter of a region can 
also be computed. 

2. DEFINITIONS AND NOTATION 

This section contains some basic definitions and 
terminology for region representations that are 
fundamental for the remainder of this paper. 

*This research was supported in part by the Natural 
Sciences and Engineering Research Council of Canada 
under Grant NSERC A7634. 
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Definition 1: An image is a 2 n by 2 n array of unit 
square pixels each of which can assume one of 2 k 
values, where n is called the resolution parameter of 
the image. 

Definition 2: An image is called a binary image 
when its pixels assume either 1 or 0 values. A pixel is 
BLACK if it has the value of 1, otherwise it is WHITE. 

Without loss of generality, only binary images will 
be considered in this paper since all of the algorithms 
can be extended to nonbinary images. 

Definition 3: The region of a binary image is 
composed of all BLACK pixels, and the background of 
the region is composed of all WHITE pixels. 

Definition 4: Let (i, j) represent the location of a 
pixel p in a given image, where i and j are the column 
and row positions respectively. Then p has four 
horizontal and vertical neighbors located at: (i-1, j), 

(i, j-1), (i, j+1) and (i+1, j). These pixels are called 
the 4-neighbors of p, and are said to be 4-adjacent to p. 

Definition 5: For two BLACK pixels, p and q, of a 
region, p is said to be connected to q if there is a path 
from p to q consisting entirely of pixels of the region. 

Definition 6: For any BLACK pixel p, the set of 
pixels connected to p is called a connected component 
of the region. If a region has only one component, 
then it is called "connected". 

Based on the principle of recursive decomposition, 
an image is decomposed in the following manner to 
separate a region from its background! 10]. If the 
region does not cover the entire binary array, the 
array will be subdivided into four equal-sized square 
blocks. This process will be applied recursively, until 
blocks are obtained that are either totally contained in 
the region or totally disjoint from it. The recursive 
decomposition of an image produces blocks that must 
have standard sizes (powers of 2) and positions. 

Similar definitions can now be formulated in terms of 
blocks. 
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Definition 7: A block is said to be BLACK if it 
contains only BLACK pixels, WHITE if it contains 
only WHITE pixels, and GREY if it contains both 
BLACK and WHITE pixels. 

The four sides of a block are referred as to its 
North, East, South and West sides, or N, E, S and W 
for short. Let OPSIDE(T) be the side opposite to T, 
e.g., OPSIDE(E)=W. 

Definition 8: Two blocks P and Q are said to be 
4-adjacent along the side T of P if the side T of P 
touches the side OPSIDE(T) of Q. 

Definition 9: BLACK blocks P and Q are said to be 
connected if there exists a path consisting entirely of 
BLACK pixels from a pixel of P to a pixel of Q. 

Definition 10: For two integers I and J given by 
n-l n-1 

I = X(Ii-2*), and J = L(Ji'2i). where Ij, Ji 6 {0,1}, 
i=0 i=0 

n-l 

SHUFFLE(I,J) = I(Ii-2+Ji)-4i. 
i=0 

To represent a block obtained by the recursive 
decomposition method requires the following 
definition: 

Definition 11: The key of a block or node with 2 s 
by 2 s pixels is SHUFFLE(I, J), where (I, J) is the 
location of its left bottom pixel, and s is the resolution 
parameter of the block. 

It is now easy to show that the two-tuple <K,s> 
uniquely represents a block, where K and s are the key 
and resolution parameter of the block, respectively. 

A modified linear quadtree (MLQ) is defined 
to be a sequence of BLACK nodes in two-tuple form 
sorted in ascending key order. This differs from the 
usual definition of a linear quadtree in that the key of 
the node is stored as a single integer rather than as an 
n-digit quaternary code, and the resolution parameter 
of the node is given explicitly rather than implied by 
the number of don’t care characters in the quaternary 
code. This modification results in space efficiency and 
improved execution time [9]. 

In presenting the connected component labeling 
algorithm, each BLACK node in the MLQ is stored as 
a record consisting of three fields. The first two 
fields, termed KEY and RES, contain the key and the 
resolution parameter of the node, respectively. The 
third field, termed ID, identifies the connected 
component containing the node. It is set as a result of 
the algorithm to be presented. An array M is used to 
represent the MLQ. Therefore, M has the property 
that for any 

i, j = (1,2,...,N), if i < j then M[i]-KEY < M[j]-KEY. 
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The predicate UNEXPLORED(P,T) is true if 
and only if the side T of node P has not been marked 
"explored" in the progress of the algorithm. The 
predicate LABEL(P) is true if and only if P.ID has 
been assigned a value. 

3. AN OBSERVATION 

Given a node P in M, its four adjacent or 
neighboring nodes can be determined in 0(n) steps 
[1,9]. Suppose Q is the adjacent node to P in the west 
direction. The color of Q can be determined as 
WHITE, BLACK or GREY in 0(log N) time [1,9]. 
The BLACK or WHITE color of Q provides the 
information regarding whether Q is connected to P or 
not. Very little knowledge, however, of what is 
happening between P and Q is known when the color 
of Q is GREY. Simply, this is because there can either 
be no BLACK node or as many as up to 2 s BLACK 
nodes in Q adjacent to P, where s= P-RES, i.e., P is a 
block of 2 s by 2 s pixels. 

This implies that up to 2 s further searches on M 
must occur in order to exhaust all possible adjacencies. 
In fact, this is precisely what Samet's algorithm does. 
Assuming a random image, in the sense that a node is 
equally likely to appear in any position and at any level 
in the quadtree, the neighbor finding operation using a 
quadtree is so efficient that the average number of 
nodes visited is a constant [8]. Correspondingly, the 
neighbor finding operation using a linear quadtree is 
less efficient in that the average number of nodes 
visited is 0(log N) [2]. Therefore, a connected 
component labeling algorithm using a linear quadtree 
cannot do the same thing as Samet's algorithm does. 

Gargantini's algorithm [3] imposes a special 
configuration on the region to avoid performing an 
exhaustive search. As a result, the algorithm is not 
able to deal with regions with arbitrary 
configurations. Clearly it is a crucial step, in 
achieving an efficient method that when Q, the 
adjacent node of P, turns to be GREY, of how to 
preclude further searching on M without losing any 
information regarding the adjacencies. 

It is this observation that leads to a new method, to 
be described in the next section, for labeling all 
connected components of a region using an MLQ. 

4. AN INFORMAL DESCRIPTION 

The connected component labeling algorithm has 
three phases. An array called MAP will be used 
mainly by the first phase. MAP is constructed from M 
such that for any two integers, i, j = (1,2,...,N), 
if i < j then M[MAP[i]]-RES < M[MAP[j]]-RES. In 
essence, the use of MAP provides the visit of the nodes 
in M in ascending size order, while traversing M. 

The first phase explores all possible adjacencies 
between any pair of BLACK nodes in M and generates 
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equivalence pairs. The second phase merges all the 
equivalence pairs generated during phase one into 
equivalence classes. Finally, the third phase assigns 
the same identifier (i.e., the label) to those BLACK 
nodes that belong to the same equivalence class to 
reflect a connected component. 

In particular, phase one traverses M in ascending 
size order. For each BLACK node P in M being 
visited, and T in {N,E,S,W}, if the side T of P has not 
been previously marked, then the adjacency between 
node P and the BLACK node of greater or equal size 
along the side T of P needs be explored. If such a 
BLACK node indeed exists in M, say Q, then the side 
OPSIDE(T) of Q is marked "explored" and is 
assigned the same label as that of P to indicate that both 
P and Q belong to the same component. Depending on 
the configuration of the region under consideration, Q 
may already have been assigned a label different from 
that of P, in which case, an equivalence pair consisting 
of the two labels is generated. This equivalence pair 
will be used in the later stages of the algorithm to 
update the labels of P and Q so that eventually they 
will be assigned the same label. If the side T of P has 
already been marked "explored", then the exploration 
of the adjacency to the side T of P is no longer needed. 

The consequence of this technique is not only to 
save one search on M, but rather to save the necessity 
of exhausting all possible adjacencies along the side T 
ofP. The reason for this is as follows. The side T of 
P can be marked "explored" only at the time when that 
side of P was found to be connected to a BLACK node 
that was being visited by the algorithm. The size of 
this BLACK node cannot be bigger than P for 
otherwise it would not be visited before P. As a 
matter of fact, there could be as many such BLACK 
nodes as die size of P in M. Regardless how many 
BLACK nodes of this nature exist, the "explored" 
status of the side T of P, while P is being visited, 
simply indicates that the exploration of the adjacencies 
across the side T has been previously done. 

The distinct feature of this algorithm is that phase 
one guarantees that, at most, one exploration of an 
adjacency along each side of every BLACK node in M 
is sufficient to discover all possible adjacencies 
between any pair of BLACK nodes. To see this, 
remember that phase one visits the nodes in M in 
ascending size order. Consider, for example, the 
image in Fig. 1, where the resolution parameter n is 
3. By the time BLACK node A is visited, its eastern 
adjacency needs not be re-explored, since BLACK 
nodes E, D, C and B have already been visited before 
A, and the adjacencies were discovered at that time. 
Now, however, its northern adjacency must be 
explored, since that side of A cannot be marked 
"explored" although F was visited before A. As A's 
northern neighbor of equal size is found to be GREY, 


the algorithm immediately concludes that there does 
not exist a BLACK node adjacent to the northern side 
of A, for otherwise the northern side of A would have 
been marked "explored”. Therefore, no further 
search is necessary. 

Phase two will merge the equivalence pairs 
generated during phase one into equivalence classes in 
such a way that each equivalence class contains all 
labels assigned to those BLACK nodes that form a 
connected component. 

Finally, phase three updates the labels assigned to the 
BLACK nodes during phase one using the equivalence 
classes generated by phase two. Upon completion of 
phase three, all BLACK nodes of each connected 
component will have unique labels. 


Fig. 1. An Adjacency Configuration. Rg 2 A Regfon With 2 Components . 

5. THE FORMAL ALGORITHM 

The connected component labeling algorithm will 
now be specified by the following procedures. 
Actually, only those procedures that correspond to 
phases one and three will be presented. Phase two can 
be achieved by using the well known UNION-FIND 
algorithm [11]. The main procedure is named 
LABEL-CC, and invoked with an array M and an 
integer N corresponding to the number of BLACK 
nodes in M. Steps 1 and 2 construct the MAP and 
initialize a list called E-list which will contain the 
equivalence pairs as they are generated. Procedure 
PROPAGATE implements phase one. It visits the 
nodes in M in ascending size order through MAP, 
explores the adjacencies between pairs of BLACK 
nodes by invoking EXPLORE, assigns labels 
produced by ID-GENERATOR, and accumulates 
equivalence pairs in the E-list. Procedure EQ- 
NEIGHBOR used by EXPLORE computes the key 
of M[j]'s equal-sized neighbor in the direction 
specified by the parameter side. The unspecified 
procedure SEARCH(M, P) works as follows: if P is 
a BLACK node then SEARCH returns an integer 
value k such that P is either equal to or contained in 
M[k]. However, if P is WHITE or GREY then 
SEARCH simply returns a zero. Unique labels are 
generated by procedure ID-GENERATOR, and 
assigned to BLACK nodes by procedure ASSIGN- 
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LABEL. Procedure UPDATE implements phase 
three by uniquely labeling each component while 
scanning M. 

Procedure LABEL-CC(M, N) 
begin 

1 construct MAP; 

2 E-list:={<|>}; 

3 PROPAGATE(M, N); 

4 generate equivalence classes from E-list; 

5 UPBATE(M, N); 
end; 

Procedure PROPAGATE(M, N) 
begin 

for i:=l to N do 
begin 

j:=MAP[i]; 

for side in {N,E,S,W} do 

if UNEXPLORED(M[j], side) 
then EXPLORE(M[j], side); 
if mot LABEL(M[j]) then 
M[j].ID:= ID-GENERATOR; 

end; 

end; 

Procedure EXPLORE(M, j, side) 

begin 

neighbors EQ-NEIGHBOR(M[j], side); 
k:= SEARCH(M, neighbor); 
if k > 0 then 
begin 

mark OPSIDE(side) of M[k] "explored"; 
ASSIGN-LABEL(M[j], M[k]); 

end; 

end; 

Procedure ASSIGN-LABEL(node.adj) 

begin 

if LABEL(node) and LABEL(adj) 
then if node.ID * adj.ID 
then add (node.ID,adj.ID) to E-list; 
else if LABEL(node) 
then adj-ID:=node.ID 
else if LABEL(adj) 
then node.ID:=adj.ID 

else node.ID:=adj.ID:= ID-GENERATOR; 

end; 

Procedure UPDATE(M, N) 
begin 

for i:=l to N do 

M[i].ID:= FIND(M[i]); 

end; 

Example: As an example of the application of the 
algorithm, consider the region given in Fig. 2 whose 
block decomposition is given in Fig. 3. The BLACK 
nodes have been numbered in the order in which they 
were visited by phase one. Thus node 1 has been 
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visited before nodes 2,3, etc. The labels assigned to 
the two components by die first phase of the algorithm 
are shown in Fig. 4. A short explanation about Fig. 4 
is necessary at this point. When node 7 is visited, 
neither node 7 nor node 11, its eastern neighbor, has 
been labeled yet, thus label d is generated and assigned 
to both. When node 8 is visited, it has no label, but its 
northern neighbor, node 11, has already been 
assigned the label d, and thus node 8 is assigned the 
label d as well. 

Fig. 4 illustrates the status of the image at the 
conclusion of the first phase of the algorithm. It has 
four different labels: a, b, c and d, with a equivalent to 
b, and b equivalent to c. The equivalence pair (a, b) 
was generated when node 9 was visited and its 
northern adjacency was explored. In essence, node 9 
was labeled with a when node l's western adjacency 
was explored, whereas node 10 was labeled with b 
when node 2's western adjacency was explored. 
Similarly, the equivalence pair (b,c) was generated 
when node 5 was visited. 

Applying the second phase of the algorithm to the 
generated equivalence pairs results in the following 
two equivalence classes: {a,b,c} and {d}. 

Fig. 5 shows the labels updated by the third phase 
of the algorithm. 

Theorem 1: The time complexity of the connected 
component labeling algorithm is O(n-N). 

Proof: Constructing the MAP requires time 0(N-log 
N). Phase one calls procedure EXPLORE N times, 
and procedure EXPLORE requires time 
O(n+log N), where n and log N originates from the 
invoking of procedure EQ-NEIGHBOR and 
SEARCH, respectively. Therefore phase one takes 
time 0(n-N + N-log N). Phase two requires time 
0(N-log N) [9]. Phase three requires time 0(N). 
Since log N < 2n, the time complexity of the 
algorithm is therefore O(n-N). 
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Fig. 3. Decomposition of Fig. 2. Fig. 4. Results of Phase 1. 

6. COMPUTING THE PERIMETER 

Perimeter computation is another basic operation 
in image processing. Algorithms computing the 
perimeter of a region in a binary image represented 
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Fig. 5. Labels Resulting From Phase 5. Fig. 6. A Connected Region. 

either by an array or by a chain code are contained in 
[5]. An algorithm for computing the perimeter of a 
region encoded as a quadtree has also been developed 
by Samet [7]. 

The following perimeter computation algorithm 
traverses the MLQ in ascending size order. For each 
node P in the MLQ being visited, the length of each of 
its four sides is first included in the value of the 
perimeter. Then the neighbor nodes of P which have 
not been previously visited need to be considered. For 
each adjacent node Q that is BLACK, twice the length 
of the common side is deducted from the value of the 
perimeter. This reflects the fact that the segment 
between P and Q does not belong to die boundary of 
the region. The factor 2 occurs because the adjacency 
between two BLACK nodes is explored once and only 
once due to the traversal strategy used. 

For example, given the BLACK node D in Fig. 7, 
the common segment between D and its southern 
neighbor A is explored by the time D is visited, but the 
same common segment is not considered when A is 
visited. Therefore the length of this segment DA has 
to be deducted in advance when D is visited. 

The following procedure PERIMETER specifies the 
algorithm. 



Fig. 7. Decomposition of Fig. 6. 


Procedure PERXMETER(M, N) 
begin 

construct MAP; 

perimeter:=0; 

for i:=l to N 
begin 
j:=MAP[i]; 

segment:=2** M[j].RES; 
perimeter: = perimeter + 4 * segment; 
for side in (N, E, S, W} do 
if UNEXPLORED( M[j], side) then 
begin 

neighbors EQ-NEIGHBOR(M[j], side); 
k:= SEARCH(M,neighbor); 
if k > 0 then 
begin 

perimeter:=perimeter - 2 * segment; 
mark OPSIDE(side) of M[k] "explored"; 
end; 
end; 

end; 

retum(perimeter); 

end; 

The key to this algorithm is that each node in the 
MLQ is visited once and, at most, its four neighbors 
need be explored. Such an advantage is achieved by 
traversing the MLQ in ascending size order. 
Otherwise, in the worst case, when the node being 
visited is of size 2 n_1 by 2 n_1 ,2 n_1 nodes need be 
searched as in Samet's algorithm [7]. 

Example: Consider the region given in Fig. 6. The 
corresponding block decomposition is shown in 
Fig. 7. The MLQ contains six BLACK nodes 
representing blocks A, B, C, D, F and G. Assuming 
n=3, the perimeter is 30. Procedure PERIMETER 
visits the BLACK nodes in the order: B, C, D, F, G 
and A. 

The following table contains a step-by-step trace 
through the algorithm for this example. The symbols 
'<])' and stand for don't care and non-existance, 
respectively. 

Theorem 2: The time complexity of the algorithm 
PERIMETER is 0(n N). 

Proof: Similar to the proof of Theorem 1. 

Note that if the region is not connected, i.e., it 
contains more than one connected component, then the 
algorithm will return the sum of the perimeters of 
each connected component. It is, however, not 
difficult to compute the perimeter of every connected 
component of the region simultaneously in the same 
time complexity with a minor modification of the 
algorithm, provided that all connected components 
have been labeled. 
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7. CONCLUSION 

Techniques for labeling connected components and 
computing the perimeter of a region have been 
described. The algorithm for labeling connected 
components is superior to the one using a standard 
linear quadtree [3] in the sense that it is capable of 
handling regions with arbitrary configurations. By 
the same token, the perimeter computation algorithm 
shares the same advantage. 
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Abstract 

Asterisk* is a testbed system designed to support the 
development of new kinds of splines. The key concept is the 
integration of symbolic computation facilities with tools for 
interactively modifying and comparing different splines. By 
modeling a spline as a list of attributes, Asterisk* can be 
used to create and manipulate almost any spline, without 
making assumptions about the future directions of spline 
research. 

Resumi 

Asterisk* est un syst&me d’etude cree pour appuyer le 
developpement de nouvelles sortes de courbes k base de 
splines. Le concept clef est l’int^gration d'outils interactifs 
pour modifier et comparer differents courbes, avec un 
systeme de calcul symbolique. En decrivant une courbe par 
une liste d’attributs, le systeme Asterisk* peut etre utilise 
pour creer et manipuler presque n’importe courbe k base 
de splines sans prosumer les directions de cette recherche k 
l’avenir. 

KEYWORDS: geometric modeling, software, symbolic 
computation, parametric curves. 


1. Introduction and Motivation 

A spline is a mathematical formulation of a curve 
used for a wide variety of modeling applications. Unlike 
polygonal representations, splines provide compact and 
resolution-independent descriptions of complex objects. 
Many splines combine a set of control vertices with a set of 
blending functions to determine the path of a curve through 
space. Different splines have different properties, and the 
choice of which spline to use depends on the problem at 
hand. For example, some splines interpolate (pass through) 
the control vertices, while others approximate (pass near) 
them. 

There is no single type of spline that is ideal for all 
applications; each spline is a tool best suited to some 
particular set of tasks. Current spline research involves 
developing new splines with specific properties as solutions 
to different problems. To create a spline representation 
with a desired set of properties, the blending functions 
must satisfy a set of constraints . Sometimes this involves 
combining constraints from existing splines to arrive at a 
new formulation [4]; in other cases it requires substantial 
trial and error to find a suitable set of constraints that 
produces the desired properties. 

Most splines can be described mathematically in simple 
terms. In spite of this, establishing the constraints that 
correspond to desired properties, and then solving for a 
set of blending functions that satisfy those constraints is 
a task that can easily become overwhelming. Computer 
algebra systems such as Vaxima [5] can perform error- 
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free symbolic (as opposed to numeric) computations of 
complexity too great for humans to handle. Vaxima 
takes a description of the constraints and produces a 
symbolic representation of the blending functions. Recent 
efforts in spline development have taken advantage of such 
capabilities [1,4]. 

To understand the behaviour of new splines, it is useful 
to draw them, preferably in an interactive setting. This 
generally requires an evaluation routine to compute points 
on the curve, and a collection of routines to graphically 
display and modify the curve. The coding of the evaluation 
routine requires tedious translation from the symbolic 
representation into the implementation language. 

Intuition gained from visual feedback, as well as 
comparison with other spline representations, often leads to 
modification of the constraints that define the spline. This 
in turn changes the symbolic representation, necessitating 
a recoding of the evaluation routine. 

We would like to tighten this specification/visualization 
loop and automate the translation step. Feng & Riesen- 
feld[6] describe just such a system for the development 
of Boolean sum surfaces. Although Feng & Riesenfeld had 
access to the symbolic algebra system REDUCE[ 7], they 
chose not to use it for two reasons: they were interested 
primarily in simple operations on rational bivariate poly¬ 
nomials, and they felt that they could not provide a suffi¬ 
ciently high level of user interaction by running a large sys¬ 
tem like REDUCE. Instead, Feng & Riesenfeld implemented 
a small, specialized algebra system capable of performing 
operations such as operator composition, evaluation, and 
symbolic differentiation of bivariate polynomials. 

This paper describes a testbed system called Asterisk* 
designed to integrate the power of symbolic computation 
with convenient, extensible, interactive tools for modifying 
and comparing different splines [8]. Asterisk* is intended 
for use by those doing research in spline techniques rather 
than those wishing to design specific objects using splines. 
Our system is similar to that of Feng & Riesenfeld, but 
differs in that the emphasis is on generality. Since future 
spline representations may not conform to ail the paradigms 
of current spline techniques, one of our primary goals was 
to avoid making too many assumptions about the future 
directions of spline research. 

Asterisk* supports generality in three ways: the use 
of a complete symbolic algebra system (Vaxima), an 
extensible model of a spline (a list of attributes), and 
an extensible set of interactive spline management and 
graphics routines (written in Lisp). The blending functions 
may be developed and verified within Vaxima, and then 
used directly to draw or plot the resulting curves. As 
new algorithms are developed to manipulate the spline 
representations, routines to implement these algorithms can 
be written quickly, thus extending the basic building blocks 
that Asterisk* provides. In this way, spline researchers 

Graphics Interface ’86 


can adapt the interactive testbed to their particular 
requirements and preferences. 

2o The Asterisk* Solution 

2.1. The Asterisk 41 Model of a Spline 

Thoughout this paper we use the term spline to mean 
a piecewise parametric function of the form 

m 

Q(«) = £ V,•!?<(«) (2.1) 

*=0 

where the V t * represent a set of m -f 1 control vertices and 
the Bi(u) represent a set of blending functions. 

Asterisk* provides a uniform definition protocol for 
spline specification by modeling a spline as a list of 
attributes. Each attribute is a <name,value> pair. Table 1 
summarizes some common attributes. In Lisp, these are 
stored as property lists of the spline’s name. For instance, 
the attribute list for a spline named “ucbjspline” is stored 
on the property list for the symbol ucbspline. The 
attribute list in Figure 1 was used to define the spline 
illustrated in Colour Plate 1. 


(aplinename ucb_spline 
polycolor GREEN 
polyatyle DASHED 
polydisplay ON 
curvecolor WHITE 
curveatyle SOLID 
curvediaplay ON 

controlpolygon ((.42 .41) (.41 .73) (.55 .86) 
(.86 .76) (.86 .59) (.61 .58) 
(.61 .38) (.83 .35)) 
evalroutine ucb.eval 
parameteratep 0.0625) 

Figure 1: The attribute list for ucbspline. 


2.2. System Architecture 

Conceptually, the Asterisk* testbed consists of three 
logical parts: 

• A Vaxima to Lisp translator for automatically generat¬ 
ing evaluation routines from Vaxima descriptions. This 
is the transition from the analytic expressions for the 
blending functions (the symbolic representation), to the 
procedures that evaluate the blending functions (the 
numeric representation required for graphical display). 

® Interface routines for support of graphical interactions 
such as geometric input or display of a curve. This mod¬ 
ule provides a device independent graphical interface to 
the rest of the system. 

Vision Interface ’86 



243 


Attribute Name 

Value Type 

Purpose 

controlpolygon 

List of vertices 

Rough specification of curve path. 

evalroutine 

Name of a function 

Names the function which evaluates points on the curve. 

parameterstep 

Real number 

The amount the domain parameter U is varied between 
consecutive calls to the evaluation routine. 

polys tyle 

DOTTED 

Style used to draw control polygon (curve) 

(curvestyle) 

DASHED 

SOLID 


poly color 
(curvecolor) 

Colour name 

Colour used to draw control polygon (curve) 

polydisplay 

ON 

Should polygon (curve) be displayed? 

(curvedisplay) 

OFF 


degree 

Integer 

The polynomial degree of the curve. 

knotvector 

List of real numbers u t * 

Specifies a set of parameter values associated with the break¬ 
points between curve segments. This attribute is useful for B- 
spline curves and cubic interpolator splines (cf. Bartels et al [3]). 

betal 

Real number 

Shape parameters for geometrically continuous techniques such as 

(beta2) 


Betarsplines [1,2]. 


Table Is Common Attributes 


• Spline management routines for manipulating attribute 
lists. The spline management routines facilitate con¬ 
venient modification of the spline attributes. These 
routines invoke the interface routines to interact with 
the user and invoke the evaluation routines to generate 
curve segments for each spline. 

When a draw curve operation is initiated, the spline 
management routines repeatedly invoke the evaluation rou¬ 
tine for the spline, passing in a value of the parameter. The 
parameter step attribute determines which, and how many, 
parameter values are passed in. It is the responsibility of 
the evaluation routine to return the point on the curve cor¬ 
responding to each given parameter value. This is done by 
referring to the specific attributes that are needed — at¬ 
tributes which do not affect the computation are ignored. 
The spline management routines invoke the interface rou¬ 
tines to connect the points on the curve into a piecewise 
linear approximation of the curve. 

Ideally, we would like the spline management and in¬ 
terface routines running within the Vaxima environment. 
This approach was taken in an early implementation; un¬ 
fortunately, the code space requirements of Vaxima caused 
excessive page faulting, resulting in poor performance. This 
problem can be avoided by executing Vaxima and the Vax¬ 
ima to Lisp translator in one process, and the spline man¬ 
agement and interface routines in another. In the current 
implementation these processes run on separate machines 
and communicate via disk files transmitted over a local area 
network. An advantage of the two-process structure is the 
ability to substitute an alternate symbolic computation sys¬ 
tem, such as REDUCE, by supplying an appropriate trans¬ 
lation routine. 
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2.3. Extension Features 

Asterisk* supports extensibility in two ways. Both 
the set of functions that manipulate splines and the data 
structures used to represent splines can be extended as 
needed. 

Functional extension is a byproduct of the Lisp environ¬ 
ment. Extension of spline representations follows from the 
fact that each evaluation routine ignores attributes which 
do not affect its computation. Thus, not all splines require 
all attributes, and new attributes can easily be added for 
evaluation routines which require them. More importantly, 
these additions are completely transparent to other eval¬ 
uation routines, meaning that existing code need not be 
modified. 

3. Applications 

3.1. Comparisons of Spline Curves 

It is often desirable to compare and contrast the 
behaviour of various curves. For instance, one might want 
to determine the effect of changing the control polygon, 
the shape parameters, or the polynomial degree of a curve. 
One may also wish to compare the shapes of two curves 
of different tvDes defined by the same control polygon. In 
Asterisk *, all of these comparisons can be accomplished 
using the following three steps: 

1. Define the curve that is to be the basis of the 
comparison. 

2. Make a copy of the curve by copying its attribute 
list. 
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3. Change one (or more) of the new curve’s attributes. 

To visually distinguish between the two curves it is 
often useful to change display attributes such as the line 
style or colour of the control polygon or curve. Colour 
Plates 1 through 4 illustrate such comparisons. 

3.2. The Development of a New Curve Type 

As mentioned in Section 1, one of the main motivations 
for Asterisk* was the ability to easily define and assess 
new curve types in an interactive setting. The process 
of designing new curve types typically involves repeated 
iteration through the following steps: 

1. Determine the desired properties for the curve. 

2. State these properties as mathematical constraints. 

3. Use vaxima to solve the constraints. 

4. Interactively assess and experiment with the curve. 

As an example, Table 2 describes a new curve that was 

constructed and tested using Asterisk *. The three columns 
of the table contain the desired properties for the curve 
(step 1), the corresponding mathematical constraints (step 
2), and the associated Vaxima expressions (step 3). In the 
bottom right section of the table, the lengthy derivation of 
the Vaxima expressions describing the convex hull property 


has been omitted for the sake of brevity. Figure 2 contains 
the Lisp evaluation routine produced by the Vaxima to Lisp 
translator. 

To interactively assess the new technique (step 4), we 
create a new spline attribute list and graphically specify a 
control polygon to which the evaluation routine is applied 
(see Colour Plate 3). The techniques of Section 3.1 can 
then be used to observe the behaviour of the curve under 
various conditions. 

3.3. Interactive Testing of Algorithms 

To this point, we have concentrated on the specification 
and comparison of splines themselves. As mentioned 
above, this involves creation and modification of spline 
attribute lists, using various operations which Asterisk* 
provides. Another aspect of current spline research is the 
development of algorithms that operate on splines. To 
test interactively such an algorithm without the benefit of 
a testbed system such as Asterisk *, one would be forced 
to write a specialized program to manage data structures 
and user-interaction, in addition to performing the required 
computation. 

With Asterisk *, one is able to concentrate on writing 
a Lisp function to implement the new algorithm alone; 
data structure manipulation is handled by the spline 


Property 

Constraints 

Vaxima Expression 

Cubic polynomial curve defined 
by 4 control vertices 

3 

Q00 = 

i=0 

where 

3 

Bi{u) - £/c,,V. 

J'=o 

poly(a,b,c,d) :« a+b*u+c*u~2+d~3; 
bO(u) :*■ * *(poly(k00,k01,k02,k03)); 
bl(u) :« ' p (poly(klO,kl1,kl2,kl3)); 
b2(u) ’*(poly(k20,k21,k22,k23)) ; 

b3(u) := * *(poly(k30,k31,k32,k33)); 

Symmetry: reversing the control 
points reverses the curve. 

B 0 (u) = 0 3 (1 - «) 

•Bl (u) = 02 (l-«) 

b2(u) *• (bl(l-u)) ; 

b3(u) := * * (bO(l-u)) ; 
unknowns : [kOO.kOl,k02,k03, 
klO.kll,kl2,kl3] ; 

Interpolation of first and 
last control vertices and the 
midpoint of the center leg of the 
control polygon. 

Q(o) = Vo 

Q(i) = Vl t V2 

Q(1) =V 3 

e0:b0(0)°l; el:b0(i/2)=0; e2:b0(l)«=0; 
e3:bl(0)=0; e4:bl(l/2)°l/2; e5:bl(l)=0; 

Convex hull: the curve should 
lie entirely within the smallest 
convex region containing the 
control polygon. 

£0,00 = 1 
•=0 

0,(u)>O,u€[O,l] 

bOdl (u) : - • • (dif f (bO (u) , u) ) ; 
bldl(u) :«= •■ (diff (bl(u) ,u)) ; 
e6:b0dl(0) = -1; ©7: bldl(0)=0; 
equations : [eO,61,62,63,64,65,66,e7] ; 
answer : linsolve(equations, unknowns); 


Table 2: Definition of a New Curve Type 
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(def bO (lambda (u) 

(+ 1.0 (*. -5.0 u) 

(*. 8.0 (expt u 2.0)) 

(*. -4.0 (expt u 3.0))))) 

(def bl (lambda (u)) 

(+ (*. 4.0 u) 

(*. -8.0 (expt u 2.0)) 

(*. 4.0 (expt u 3.0))))) 

(def b2 (lambda (u) 

(+ (*. 4.0 (expt u 2.0)) 

(*. -4.0 (expt u 3.0))))) 

(def b3 (lambda (u) 

(+ u (*. -4.0 (expt u 2.0)) 

(*. 4.0 (expt u 3.0))))) 

Figure 2: Code produced by the vaxima to lisp translator 
for the example of Table 2. 

management routines and user interaction is handled by 
the interface routines. If the operation provided by the 
algorithm is of lasting interest, it is a simple matter to 
include it on one of the Asterisk* menus, thereby allowing 
the user to graphically invoke the algorithm by selecting 
the menu item and the spline that is to be operated on. 

As a specific example, consider an algorithm to perform 
linear subdivision of polynomial splines. The subdivision 
process takes as input the control polygon describing one 
segment of a curve, and returns a control polygon that 
generates only a portion of the original curve (see Colour 
Plate 4). This operation was added to Asterisk * by 
writing a Lisp function that requests a copy of a spline’s 
attribute list, and replaces the controlpolygon attribute 
with a subdivision polygon. The new operation can now 
be applied interactively and the subdivided curves can be 
directly compared to the original ones. 

4. Summary 

Symbolic algebra systems have allowed increasingly 
complex spline formulations to be developed. However, 
interactive visual feedback is needed to understand the 
behaviour of the resulting curves. Intuition thus gained 
often leads to reformulation of the underlying mathematics. 

Asterisk* has been developed to tighten this design 
loop by automating the conversion between symbolic 
and graphical representations. As such, it provides a 
framework for the interactive development, manipulation, 
and comparison of new splines; it is not merely a program 
with a fixed set of built-in splines. Asterisk* also provides 
facilities for quickly implementing and exercising new 
algorithms that operate on spline representations. In 
addition, because Asterisk* allows one to concentrate on 
the mathematics and implementation of computational 
algorithms without worrying about user interaction and 


graphical display, we believe that it will prove useful as 
a tool for teaching basic curve and surface techniques. 

The Asterisk* model of a spline as a list of attributes 
does not restrict us to dealing with two-dimensional 
curves. Modifying the data structures and graphical display 
routines to handle three-dimensional curves and surfaces is 
an obvious extension. 
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Colour Plate 1. A uniform cubic B-spline. The control Colour Plate 2: Two splines that share a common control 

polygon is coloured green and the curve is coloured white. polygon (shown in green), but have different evaluation 

routine attributes. The white curve is a uniform cubic B- 
spline, and the red curve is a seventh degree Bezier curve. 



Colour Plate 3: This plate illustrates a midpoint curve 
(shown in red), together with its control polygon (shown 
in green). As required by the representation, the curve 
interpolates the first and last control vertices, as well as 
the midpoint of the middle leg of the control polygon. 


Colour Plate 4: This plate demonstrates the behaviour 
of a subdivision algorithm for a cubic Bezier curve. The 
original control polygon (shown in green) generates the 
green curve when its parameter varies on the interval [0,1]. 
The subdivision polygon (shown in white) generates the 
white curve when its parameter varies on the interval 
[0,1]. The subdivision polygon is constructed so that 
the white curve is identical to the portion of green curve 
corresponding to the interval [0,2/3]. 
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ABSTRACT. 

A new method for generating pictures is presented and illus¬ 
trated with examples. The idea is to generate a string of 
symbols using an L-system, and to interpret this string as a 
sequence of commands which control a '’turtle". Suitable 
generalizations of the notions of the L-system and of a turtle 
are introduced. The resulting mathematical model can be 
used to create a variety of (finite approximations of) fractal 
curves, ranging from Koch curves, to classic space-filling 
curves, to relatively realistic-looking pictures of plants and 
trees. All these pictures are defined in a uniform and com¬ 
pact way. 

RESUME. 

Une nouvelle methode pour engendrer des images est 
presentee et illustree avec des exemples. Cette methode 
comprend deux etapes. On commence par engendrer une 
sequence de symboles avec un L-systbme. Ensuite on utilise 
cette sequence pour controler les mouvements d’une tortue 
qui trace l’image en question. Les notions d’un L-systbme et 
d’une tortue sont generalises pour mieux correspondre au 
but de la creation des images. Le modele mathematique 
resultant s’applique k la creation d’une large variete d’objets 
fractals, y compris des courbes de Koch, des courbes qui 
remplissent tout une aire plaine, ainsi que des images rela- 
tivement realistes des plantes et des arbres. Toutes ces 
images sont definies d’une manibre homogbne et compacte. 

KEYWORDS: L-systems, turtle geometry, fractals, space¬ 
filling curves, plants, trees. 


1. INTRODUCTION. 

Rewriting systems can be used to generate pictures in 
two different ways. In the first case, rewriting systems 
operate directly on two-dimensional objects, such as arrays 
[Kirsch 1964, Dacey 1970], graphs [Rosenfeld and Milgram 
1972, Pfaltz 1972], or "shapes" [Gips 1975, Stiny 1975]. In 
the second case, string grammars (in the broad sense of the 
word, including parallel rewriting systems) are used to define 
strings of symbols. A graphic interpretation function subse¬ 
quently maps these strings into pictures. This paper 
describes a method for generating pictures based on this 
second approach. After the idea of applying string gram¬ 
mars to pictures is put into a historic perspective in Section 
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2, attention is focused on L-systems [Lindenmayer 1968]. 
The necessary definitions related to OL-systems are collected 
in Section 3. Section 4 adapts the notion of a "turtle" 
[Papert 1980, Abelson and diSessa 1982] to the purpose of 
graphical interpretation of strings, and presents examples of 
pictures generated by OL-systems under this interpretation. 
Section 5 extends this basic approach in two directions: by 
generalizing the notion of the L-system, and by increasing 
the range of string symbols interpreted by the turtle. Section 
6 presents conclusions and lists several open problems. 

2. THE HISTORICAL BACKGROUND. 

The idea of describing pictures using formal (string) 
languages emerged a few years after Chomsky established 
the fundamental concept of a phrase-structure grammar. 
Narasimhan [1962, 1966] and Ledley [1964, 1965] are 
credited with the first results in this area. Their interest was 
in the recognition of handwritten characters and chromo¬ 
somes, respectively. An approach designed for describing a 
wider class of pictures using string grammars was proposed 
by Shaw [1969]. For a survey of these early results, see Fu 
[1980]. 

The early research concentrated on picture recognition. 
Pictures were described as strings of symbols which 
represented selected primitives, such as straight segments, 
sharp V-tums, wide U-turns or branches. In some cases, 
relations between picture elements, such as ABOVE, 
BELOW, of INSIDE, were also considered as primitives. 
The actual picture recognition was performed by parsing the 
resulting strings. 

In the case of picture generation, the correspondence 
between string symbols and picture primitives must be 
specified in more detail. The first such specification, known 
as chain coding, was developed by Freeman [1961]. Feder 
[1968] showed that the languages of chain codes describing 
such classes of figures as straight lines of arbitrary slope, 
circles of arbitrary radius, or convex figures in a plane, are 
all context sensitive. It was subsequently pointed out (for 
example, by Fu [1980]) that even intuitively simpler classes 
of pictures, for example the set of all rectilinear squares in 
an integer grid, correspond to context-sensitive chain-code 
languages. To a certain degree, this discouraged a further 
study of chain-code languages, for the context-sensitive 
grammars are usually difficult to construct and do not pro¬ 
vide an intuitively clear description of languages. Neverthe- 
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less, picture generation using Chomsky grammars and the 
chain interpretation has recently received considerable 
attention [Maurer, Rozenberg and Welzl 1982, Sudborough 
and Welzl 1985]. 

In order to describe growth of living organisms, Lin- 
denmayer [1968] introduced the notion of a parallel rewrit¬ 
ing system. The Lindenmayer systems, or L-systems, 
attracted the interest of many researchers, and the theory of 
L-systems was soon extensively developed [Herman and 
Rozenberg 1975, Lindenmayer and Rozenberg 1976]. How¬ 
ever, although a geometrical interpretation of strings was at 
the origin of L-systems, they were not applied to picture 
generation until 1984, when Aono and Kunii [1984], and 
Smith [1984] used them to create realistic-looking images of 
trees and plants. Approximately at the same time, Siromoney 
and Subramanian [1983] noticed that L-systems with chain- 
code interpretation could be used to generate some space¬ 
filling curves. 

This paper further investigates graphical applications of 
L-systems. The emphasis is on the turtle interpretation 
rather than the chain coding, because the turtle interpretation 
allows for generating pictures which are not confined to a 
grid. The notions of the L-system and of a turtle are gen¬ 
eralized to provide increased flexibility in picture 
specification. The resulting mathematical model is used to 
generate a wide spectrum of fractal curves. 

3. OL-SYSTEMS. 

This section summarizes fundamental definitions and 
notations related to OL-systems. For their tutorial introduc¬ 
tion, see Salomaa [1973], and Herman and Rozenberg 
[1975]. 

Let V denote an alphabet, V* - the set of all words 
over V , and V+ - the set of all nonempty words over V. 

Definition 3.1. A OL-system is an ordered triplet 
G = < V, co, P > where V is the alphabet of the system, 
co e V+ is a nonempty word called the axiom and P <= VxV* 
is a finite set of productions. If a pair (a , x) is a produc¬ 
tion, we write a -» %. The letter a and the word % are 
called the predecessor and the successor of this production, 
respectively. It is assumed that for any letter a e V, there is 
at least one word % e V* such that a —» %. A OL-system is 
deterministic iff for each a eV there is exactly one % e V* 
such that a —» %. 

Definition 3.2. Let G = < V, co, P > be a OL-system, and 
suppose that \i = a l ...a m is an arbitrary, word over V. We 
will say that the word v = Xi-’-Xm I s directly derived 
from (or generated by) \i and write (i => v iff a { -> %i for 
all i = 1 A word v is generated by G in a derivation 

of length n if there exists a sequence of words p 0 , |i lf ... f \i n 
such that |i 0 = 0), = v and |i 0 =» jij => ... => |i n . 

4. GENERATING PICTURES USING OL-SYSTEMS 
WITH TURTLE INTERPRETATION. 

This sections formalizes the notion of the turtle 
interpretation of a string and provides examples of pictures 
generated by OL-systems under this interpretation. 

Definition 4.1. A picture H is a set of points in the plane: 
fl e 2 RxR . A function /: V*^2 RxR mapping the set of strings 
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over the alphabet V into the set of pictures is called a 
(graphic) interpretation function. 

Definition 4.2. A state of the turtle is a triplet (x, y, a), 
where the coordinates (x, y) represent the turtle’s position, 
and angle a, called the turtle’s heading, is interpreted as the 
direction in which the turtle is facing. Given the step size d 
and the angle increment 6, the turtle can respond to com¬ 
mands represented by the following symbols: 

F Move forward a step of length d. The state of the turtle 
changes to (x', y', a), where x - x + d cosa and 
y = y + d sina. A line segment between points (x, y) 
and (x', y) is drawn. 

f Move forward a step of length d without drawing a 
line. 

+ Turn right by angle 8. The next state of the turtle is 
(x, y, a+5). (Here we assume that the positive orienta¬ 
tion of angles is clockwise.) 

- Turn left by angle 8. The next state of the turtle is 
(x, y, oc-8). 

| Turn away. The state of the turtle changes to 
(x, y, oc+180°). 

All other symbols are ignored by the turtle (the turtle 
preserves its state). 

Definition 4.3. Let v be a string, (x 0 , y 0 , a 0 ) - the initial state 
of the turtle, and d, 8 - fixed parameters. The picture (set of 
lines) drawn by the turtle responding to the string v is called 
the turtle interpretation of v. 

Figure 1 presents chain interpretations of the words 
generated by some deterministic OL-systems. In all cases, the 
angle increment 8 is equal to 90°. Data under each picture 
indicate the length of derivation n , the step size d (in pix¬ 
els), the axiom co of the OL-system, and the set of 
productions P. Note the variety of the shapes obtained, and 
the simplicity of the underlying L-systems. 

The Hilbert curve (Fig. If) is representative of classic 
space-filling curves. Other well-known space-filling curves 
were discovered by Peano [1890] and Sierpinski [1912]. The 
Peano curve is generated by the L-system with axiom X and 
productions: 

X -> XFYFX+F+YFXFY-F-XFYFX 
Y -> YFXFY-F-XFYFX+F+YFXFY 
F —> F -f —>+ -> — 

A square-grid approximation of the Sierpinski curve is gen¬ 
erated by the L-system with axiom F+XF+F+XF and pro¬ 
ductions: 

X -> XF-F+F-XF+F+XF-F+F-X 
F —» F + —> + -> — 

Once again, note the simplicity of these descriptions. For 
the long history of creating possibly elegant and compact 
programs for generating space-filling curves, see [Null 1971, 
Wirth 1976, Goldschlager 1981, Witten and Wyvill 1983, 
Cole 1983, Cole 1985]. 

5. EXTENSIONS OF THE BASIC MODEL. 

This section generalizes the notion of the L-system and 
extends the notion of the turtle interpretation of a string. The 
purpose is to increase the flexibility of picture specification. 
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Various extensions to OL-systems have been thoroughly 
studied in the past [Salomaa 1973, Herman and Rozenberg 
1975, Lindenmayer and Rozenberg 1976]. Specifically, 2L- 
systems use productions of the form a t < a> a r —> %; this 
notation means that the letter a (called the strict predeces¬ 
sor) can produce word % iff a is preceded by letter and 
followed by a r Thus, letters a t and a r form the left and the 
right context of a in this production. Productions in 1L- 
systems have one-side context only; consequently, they are 
either of the form a { < a -» % or a > a r —» %. OL-systems, 
lL-systems and 2L-systems belong to a wider class of (k,l)- 
systems. In a (k,l)-system, the left context is a word of 
length k y and the right context is a word of length /. How¬ 


ever, the strict predecessor is still limited to a single letter. 

For the purpose of picture generation it is convenient to 
generalize L-systems even further by allowing for produc¬ 
tions of the form < r| > r| r -» %, where all three com¬ 
ponents of the predecessor are words of arbitrary length. 
This modification of L-systems seriously affects the notion 
of the direct derivation of words. In OL-systems and (k,l)- 
systems the strict predecessors of all productions consist of 
one letter. Consequently, in a derivation p => v each letter 
of p is a strict predecessor of some production. On the other 
hand, if the lengths of the strict predecessors T| may vary, 
the partition of p into strict predecessors depends on the par¬ 
ticular productions used to derive v. The method for parti- 



a 


n=3, d=2 
F+F+F+F 

F —> F+F-F-FF+F+F-F 




© n=14, d=l 
X 

X —-> X+YF+ 
Y —■> -FX-Y 
F —> F + 



C n=4, d=2 
F+F+F+F 

F —> FF+F+F+F+FF 



f nrS, d=6 
X 

X —> -YF+XFX+FY- 
Y —> +XF-YFY-FX+ 

F —> F + —> + - 


Fig. 1. Examples of pictures generated by OL-systems under turtle interpretation. 
Figure (a) is a quadratic Koch island [Mandelbrot 1982]. Figure (e) is the dragon 
curve [Davis and Knuth 1970]. Figure (f) is the Hilbert [1891] curve. 
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tioning the word |i introduced below is based on scanning 
the string p from left to right. 

Definition 5.1. A pL-system (pseudo-L-system) is an 
ordered triplet G = < V, co, p >, where V is the alphabet of 
the system, co e V 4 " is the axiom, and 
p: {1,...//} —> (V ¥ xV ¥ xV*)yV* is an ordered set of produc¬ 
tions. Instead of p(i) = (r\ h T|, r\ n %) we write that produc¬ 
tion p(i) is equal to rj/ < r| > rj r -4 Let /', c' and r denote 
the lengths of the strings T|/, r| and r\ n respectively. Produc¬ 
tion pi matches string 11 = a 0 ...a m at position s, 0 < s < m, 
iff the string 11 can be represented as 

a o ... Tjj rj Tj r a s+c * +/ ... a m . The production p t is 

applicable to the string p at position s iff it matches p at 
position s and no production pj preceding p t (j < i) matches 
p at this same position. 

Note. In order to keep specifications of L-systems short, it is 
convenient to assume that all productions of the form 
a -4 a, a e V are automatically appended at the end of any 
set p. Consequently, it is not necessary to list these produc¬ 
tions when specifying p. On the other hand, any production 
of the form a —» % entered explicitly into the set p will pre¬ 
cede a -4 a. Thus, production a -4 a can be '’overwritten", 
if necessary. 

Definition 5.2. Let G = < V, co, p > be a pL-system, and 
suppose that p is an arbitrary word over V . We will say that 
the word v is directly derived from p and write p => v iff 
there exists a sequence of productions 
pi}) - -qJO < rj(') > rjW -4 x® and a sequence of positions s ^ 
(t = 0 k < m) such that: 

• p = r\M..r\W 

• V = x<°> ,. X W 


• p(°) is applicable to the string p at position s®) = 0, 

• p (/+1) is applicable to the string p at position 
j(f+i) = s (t) _f_ length^*)) for all t = 0,1, ...,k- 1. 

Example. Consider a pL-system G with the following pro¬ 
ductions: 

p x = X < XY -> YX 
p 2 = X > X -» YXX 

A word p = XXYXY will be partitioned into strict predeces¬ 
sors as follows: X XY X Y. The corresponding successors 
are: YXX YX X Y. Thus, the word v directly derived from 
p is equal to YXXYXXY. 

The notion of the direct derivation is extended to the 
derivation of length n as in the case of OL-systems. Two 
curves generated by pL-systems are shown in Fig. 2. 

A useful extension of the set of commands interpreted 
by the turtle introduces two symbols "[" and "]" defined as 
follows: 

[ Push current state of the turtle into a (pushdown) stack. 

] Pop a state from the stack, and make it the current state 
of the turtle. No line is drawn, although in general the 
position of the turtle is changed. If the stack is empty 
and no state can be popped, an error is reported and the 
string has no interpretation. 

The above use of brackets is consistent with the origi¬ 
nal definition of L-systems by Lindenmayer [1968] where 
brackets were used to specify branches (of algae). This idea 
was also preserved in L-systems generating plants and trees 
for computer imagery purposes [e.g. Smith 1984]. Some 
plants and trees generated by OL-systems and pL-systems 
under the extended turtle interpretation are shown in Fig. 3. 



YF 

XF —> YF+XF+YF 
YF —> XF-YF-XF 


Fig. 2. Examples of pictures generated by pL-systems under turtle 
interpretation, (a) The Sierpinski "arrowhead", (b) The Gosper space¬ 
filling curve. Both examples are taken from [Mandelbrot 1982]. 



t> n=4, d=5, 6=60° 

XF 

XF —» XF+YF++VF-XF—XFXF-YF+ 
YF —» -XF+YFYF++YF+XF—XF-YF 
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Other graphical interpretation functions can also be 
considered. In the case of chain interpretation [Freeman 
1961] letters A, B, C and D may be interpreted as com¬ 
mands moving the turtle to the left, up, to the right and 
down, respectively. These movements change position of 
the turtle by distance d , and are independent of the turtle’s 
heading. Line segments connecting the old and the new posi¬ 
tion of the turtle are drawn. Under the chain interpretation, 
some interesting curves are generated by very simple L~ 


systems. For example, the dragon curve (Fig. le) is gen¬ 
erated by a OL-system with axiom B and productions A —> 
AB, B —> CB, C —> CD and D —» AD. Further extensions 
of the interpretation functions are also possible. For exam¬ 
ple, lines within a pair of parentheses may define the boun¬ 
dary of a filled polygon (Fig. 4). The turtle can also be 
allowed to move in three dimensions. An L-system will 
then describe a three-dimensional object rather than a two- 
dimensional picture. 



ft n=5, d=1, 6=25.7° 
F 

F F[+F]F[-F]F 



n=5, d=3, 6=22.5° 

[F] 

[ < F > ] —> F-[[F]+[F]]+F[+F[F])-[F) 
F —» FF 


C n=6, d=2, 6=25.7° 


G 

G —> GFXI+GM-G] 

X -Ht X[-FFF][+FFF]FX 



[F] 

F > ] —» F[+F]F[-F]+[F] 
F —» FF 



f n=9, d=6, 6=18° 
SLFFF 

S —> [ +++ G][-G]TS 

G —» +H[-G]L 
H —> -G[+H]L 
T —> TL 

L —» [-FFFH+FFF1F 


Fig. 3. Examples of plants and trees 


generated by OL-systems and pL-systems under the extended turtle interpretation. 
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n=4, d=8, 6=30° 

T 

T —■> R+[T]—[—L]R[++L]-[T]++T 
R —> F[—L][++L]F 
L —» [{+FX-FX-FX+I+FX-FX-FX}] 

FX —> FX 
F —> FF 

Fig 4. Example of a plant generated by a pL- 
system. Parentheses are grouping edges which 
define boundaries of filled polygons. 


6. CONCLUDING REMARKS. 

This paper presents a technique for generating pictures 
consisting of two steps: 

• A string of symbols is generated with an L-system, 

• This string is interpreted graphically as a sequence of 
commands controlling a turtle. 

The notion of the L-system is generalized to include produc¬ 
tions in which predecessors may have an arbitrary length. 
The L-systems generalized this way are called pL-systems. 
Furthermore, the turtle is equipped with a pushdown stack 
which allows it to return to a previously marked position. 
The resulting mathematical model is used to define a large 
variety of fractals ranging from simple Koch curves popular¬ 
ized by Mandelbrot [1982], to classic space-filling curves, to 
relatively realistic-looking plants and trees. All these pictures 
are defined in a uniform and compact way. Consequently, 
L-systems can be used to define fractals in a similar way 
that equations are used to define analytic curves in the 
Cartesian coordinates. In other words, the description of the 
Sierpinski arrowhead by the productions XF —» 
-YF+XF+YF- , YF -4 +XF-YF-XF+ may be as "natural” 
as the description of a circle by the equations x = R cos0, 
y = R sin0. 

Many problems related to the graphical applications of 
L-systems remain open. One possible direction of future 
research consists of finding L-systems and interpretation 
functions suitable for generating visually attractive images. 
As is often the case with fractals, these images can be 
appealing either because of their abstract beauty, or because 
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of their similarity to real-life objects. Another research direc¬ 
tion consists of exploring formal properties of L-systems 
related to picture generation. This is parallel to the study of 
graphical applications of Chomsky languages initiated by 
Maurer, Rozenberg and Welzl [1982]. Example problems 
are: Given two pL-systems and an interpretation function, 
are the resulting pictures congruent? Given a pL-system and 
an interpretation function, is the resulting line closed? Is it 
self-intersecting or tree-like? Are there any segments drawn 
more than once? Are they drawn infinitely many times (if 
the derivation length tends to infinity)? What is the function 
relating the diameter of the picture to the derivation length? 
Solutions to these problems are interesting not only from the 
theoretical point of view. They would also be useful when 
constructing pL-systems in order to generate pictures with 
given properties. 
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FRACTALS, COMPUTERS AND DNA 
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ABSTRACT 

The goal of science is to understand why things are 
the way they are. By emulating the logic of nature, 
computer simulation programs capture the essence of 
natural objects, thereby serving as a tool of science. 
When these programs express this essence visually, they 
serve as an instrument of art as well. 

This paper presents a fractal computer model of 
branching objects. This program generates pictures of 
simple orderly plants, complex gnarled trees, leaves, vein 
systems, as well as inorganic structures such as river del¬ 
tas, snowflakes, etc. More than just a visual simulation, 
this program models the growth process by mimicking 
the logic of an organism’s genetics. By manipulating the 
genetic parameters, once can modify the geometry of the 
object in realtime, using tree based graphics hardware. 
The random effects of the environment are taken into 
account, to produce greater diversity and realism. 

The program provides a study in the structure of 
branching objects that is both scientific and artistic. The 
results suggest that organisms and computers deal with 
complexity in similar ways, and that the fractal nature 
of an organism has evolved as a critical means for the 
survival of the species. 

RESUME 

Le but de la science est de comprendre le pourquoi des 
choses. En imitant la logique de la Nature, les logiciels 
de simulations informatiques permettent cerner l’essence 
des objets naturels, et deviennent ainsi des outils 
scientifiques. Lorsque ces programmes de simulation 
expriment leur resultats de facon graphique, ils devien¬ 
nent aussi des modes d’expression artistique. 

Cette communication presente un modele informatique 
pour la generation d’objects fractals arborescents. Le log- 
iciel permet de generer des images de plantes de faible 
degre de complexity des arbres noueux, des feuilles 
d’arbres, des systemes ramifies, mais aussi des systemes 
du monde inerte commme des deltas de rivieres, des 
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cristaux de neige, etc... Au deli de la simple modelisa- 
tion visuelle, ce programme simule le processus de crois- 
sance de ces formes en imitant la logique genetique de 
ces organismes. En manipulant les divers parametres de 
ce code genetique, on peut controller en Temps Reel la 
gfometrie de l’objet, grace a 1’exploitation d’un materiel 
cable pour la gestion de structures de donnees en arbre. 
Les perturbations aleatoires rencontrees dans les formes 
de croissance reelles contribuent a renforcer le realisme 
des images generees et augmentent la diversite des 
formes ainsi produites. 

Le logiciel permet d’etudier des objets a structure 
arborescente aussi bien du point de vue scientifique que 
du point de vue artistique. Les resultats obtenus sug- 
gerent que les organismes vivants et les ordinateurs 
presentent certaines analogies vis a vis de la gestion de 
structures de croissances complexes, et que la nature 
fractale de certains organismes a evolue vers un equilibre 
optimum permettant la survie de ces especes. 


1. INTRODUCTION 

Benoit Mandelbrot recognized that the relationship 
between large scale structure and small scale detail is an 
important aspect of natural phenomenon. He gave the 
name fractals to objects that exhibit increasing amount 
of detail as one zooms in closer. [9] [10] If the small scale 
detail resembles the large scale detail, the object is said 
to be self-similar. 

The geometric notion of fractal self-similarity has 
become a paradigm for structure in the natural world. 
Nowhere is this principle more evident than in the world 
of botany. Recursive branching at many levels of scale, 
is the primary mechanism of growth in most plants. 
Analogously, recursive branching algorithms, are funda¬ 
mental to computers. Many high performance processing 
engines specialize in tree data structures. 
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Computer generation of trees has been of interest for 
several years now. Examples of computer generated 
trees include 

Benoit Mandelbrot (1077)(1982) [9],[10] 

Marshall,Wilson,Carlson (1980) [11] 

Yoichiro Kawaguchi (1982) [8] 

Geoff Gardiner (1984) [7] 

Aono,Kunii (1984) [2] 

Alvy Ray Smith, Bill Reeves (1984)(1985) [17][14] 

Jules Bloomenthal (1985) [4] 

Demko,Hodges,Naylor (1985) [6] 


These branching attributes are controllable by a set of 
numerical parameters. Editing the parameters, changes 
the tree’s appearance. The parameters include: 

• The angle between the main stem and the branches 

• The ratio of the main stem to the branches 

• The rate at which the stem tapers 

• The amount of helical twist in the branches 


1 . THE TREE MODEL 

The tree model presented in this paper has the fol¬ 
lowing features: 

• A detailed parameterization of the geometric relation¬ 
ship between tree nodes. 

• Real Time Design and Animation of tree images using 
high performance hardware. 

• Application of Stochastic (random) Modeling to both 
topological and geometric parameters. 

• Stochastic modeling of tree bark. 

• High Resolution (2024 x 1980) Shaded 3d Renderings. 

Here’s how the model works: 

This program implements a recursive tree model. 
Each tree generated satisfies the following recursive tree 
node definition: 

tree := 

{ 

Draw Branch Segment 


} 


if (too small) 

Draw leaf 


else 

{ 

# Continue to Branch 

{ 

Transform Stem 
"tree” 

> 

repeat n times 

{ 

Transform Branch 
"tree” 

} 

} 


Paraphrased, a tree node is a branch with one or 
more tree nodes attached, transformed by a 3x3 linear 
transformation. Once the branches become small 
enough, the branching stops and a leaf is drawn. The 
trees are differentiated by the geometry of the transfor¬ 
mations relating the node to the branches and the topol¬ 
ogy of the number of branches coming out of each node. 


• The number of branches per stem segment 

Figure 1 shows a simple example with mostly default 
parameters. 



stem/stem ratio = .8 

Figure 1 branch/stem ratio = .4 

branching angle = 60° 


a. random numbers in fractal modeling 

If the parameters remain constant throughout the 
tree, one gels a very regular looking tree such as a fern. 
This tree is strictly self similar; that is, the small nodes 
of the tree are identical to the top level largest node of 
the tree. 

If the parameters vary throughout the tree, one gets 
an irregular gnarled tree such as a juniper tree. In order 
to achieve this, each parameter is given a mean value 
and standard deviation. At each node of the tree, the 
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parameter value is regenerated by taking the mean value 
and adding a random perturbation, scaled by the stan¬ 
dard deviation. The greater the standard deviation, the 
more random, irregular, and gnarled the tree. The result¬ 
ing tree is statistically self-similar; not strictly self¬ 
similar. 

There are several reasons for the stochastic 
approach. First, adding randomness to the model gen¬ 
erates a more natural looking image. Large trees have 
an intrinsic irregularity (caused in part by turbulent 
environmental effects). Random perturbations in the 
model reflect this irregularity. Second, random perturba¬ 
tions reflect the diversity in nature. A single set of tree 
model parameters can generate a whole forest of trees, 
each slightly different. This increased database 
amplification is one of the hallmark features of fractal 
techniques. 

4. MODELING STEM SHAPE 

By stripping all of the branches one is left with just 
a stem. By varying the transformation between stem 
segments, one derives the class of spirals and helixes and 
their random perturbations. These shapes appear in all 
forms of growth, organic and inorganic — from the inner 
ear, sea shells, and plant sprouts, to spiral galaxies. 
Spirals and helixes are in some sense degenerate self simi¬ 
lar sets. They are the atomic units that make up the 
fractal trees. 

Figure 2 shows 4 typical stem shapes. 

a) cylinder, the transformation is a translation and a 
scale. 

b) spiral: one performs a rotation perpendicular to the 
stem axis, in addition to a scale and translation. 

c) helix, one performs an additional rotation along the 
stem axis. 

d) squiggle By randomly changing the transformation 
from segment to segment, case c) becomes case d). 


Branches are simply stem shapes attached to the main 
stem and each other. 

5. RENDERING THE FRACTAL 

A variety of geometric elements can be used to 
render the branch segments. The simplest primitive is 
one single vector line per tree node. Antialiased vector 
lines with variable thickness allow one to taper the 
branches towards the tip. This method is satisfactory 
for leaves, ferns and other simple plants, for small scale 
detail in complex scenes, or for more abstract stylized 
images. Varying the vector color provides depth cuing 
and shading, and can also be used to render blossoms or 
foliage. Antialiased vectors are similar to particle sys¬ 
tems used by Bill Reeves in his forest images. [14] 

The thicker branches of a tree require a shaded 3d 
primative. Bump mapped polygonal prisms are used to 
flesh out the trees in 3d. The program makes sure that 
the polygons join continuously along each limb. The 
branches emanating from a limb simply interpenetrate 
the limb. For a more curvilinear limb shape, one can 
link several prisms together between branch points. 

Shaded polygon limbs are far more expensive than 
antialiased vectors. Since the number of branches 
increases exponentially with branching depth, one can 
spend most of an eternity rendering sub pixel limb tips, 
where bark texture and shading aren’t visible anyway. 
In addition to being faster, sub pixel vectors are easier to 
antialias than polygons on our available rendering pack¬ 
age. So for complex trees with a high level of branching 
detail, polygonal tubes were used for the large scale 
details, changing over to vectors for the small stuff. One 
can notice the artifacts of this technique. Overall, how¬ 
ever, the eye ignores this inconsistency if the cutover 
level is deep enough. Thinner branches require fewer 
polygons around the circumference; in fact triangular 
tubes will do for the smallest branches. 

0. BARK 

Sawtooth waves modulated by Brownian fractal 
noise are the source for the simulated bark texture, bark 
is generated by adding fractal noise to a ramp, then 
passing the result through a sawtooth function. A close 
up view of the bark would look like the ridges of a frac¬ 
tal mountain range. By adding the noise before the 
sawtooth function, the crests of the sawtooth ridges 
become wiggly. 

7. REAL TIME FRACTAL GENERATION 

Complex tree images can take 2 hours or more to 
render on a VAX 780. Editing tree parameters at this 
rate is not very effective. Near real time feed back is 
needed to allow one to freely explore the parameter 
space, and design the desired tree. Since vertex transfor¬ 
mation cost is high for a complex fractal tree, hardware 
optimized for linear transformations was used for the 
real time editor. The Evans and Sutherland MultiPic- 
tureSystem generates vector drawings of complex 3d 
display lists in near real time. The display lists on the 
MPS look a lot like our tree nodes: primitive elements, 
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transformed by linear transformations, and linked by 
pointers to other nodes. For a strictly self similar tree, 
the transformation is constant, therefore the entire 
display list can share a single matrix. To modify the 
tree one only has to change this one matrix, rather than 
an entire display list. This makes updating the tree 
display list very fast. For non-strictly self similar trees, 
the transformations are not the same. However a lookup 
table of less than a dozen transformations, is adequate to 
provide the necessary randomness. [17] 

Figure 3 illustrates the logic of the display list. 

The left side of the display list contains the topolog¬ 
ical description of the tree. Each node contains pointers 
to offspring nodes plus a pointer to the transformation 
matrix which relates that node to its parent node. This 
part of the display list is purely topological: it contains 
no geometric data. The geometric information is con¬ 
tained in the small list of transformations, matrices on 
the right. To edit a tree, one can create an arbitrarily 
large topology list once and then rapidly manipulate only 
the small geometry list. Alvy Ray Smith in his research 
on graftala , recognized the separation of the topological 
and geometric aspects of trees. He calls these 
components the graph, and the interpretation respec¬ 
tively. His work deals primarily with specification of the 
tree topology, ignoring interpretation for the most part 
[17]. The paper presented here emphasizes the geometric 
interpretation. The thesis of this paper is that the key 
to realistically modeling the diversity of trees lies in con¬ 
trolling the geometric interpretation. Many different 
topologies were used in this project. But by varying the 
geometric interpretation of a single topology, one could 
still generate a wide variety of trees each with its own 
distinct taxonomic identity. 

The real time generation of fractal trees has been 
packaged as an interactive editing system. This multi¬ 
window system allows one to edit the tree parameters, 
(both geometric and topological) via graphically 


displayed sliders. A vector image of the tree responds in 
real time. To see all the parameters change at once one 
performs keyframe interpolation of the parameters. 
Each tree parameterization is written to a keyframe file. 
A cubic spline program, interpolates these parameters to 
create the inbetween frames. In the resulting animation, 
the tree metamorphoses from key to key. A simple tree 
growth animation is achieved by interpolating the trunk 
width parameter, and the recursion size cutoff parame¬ 
ter. Modifying additional parameters makes the growth 
i lore complex and natural looking. For example, many 
}>iants uncurl as they grow. A metamorphosis animation 
i: achieved by interpolating parameters from different 
tree species. Growing and metamorphosing tree anima¬ 
tions appear The Palladium animation produced at 
NYIT. [1] 

The above method for real time fractal animation 
on the Evans and Sutherland Multi-Picture System using 
vectors is currently being transported to the 
CGL/Trillium real time smooth shaded polygon render¬ 
ing system. 

8. FRACTALS, COMPUTERS, AND DNA 

The economic advantage of this program is that a 
highly complex structure is generated from a simple con¬ 
cise kernel of data which is easy to produce. (Such large 
database amplification is a primary advantage of fractal 
techniques in general.) How does the representation of 
complexity by computers compare with complex expres¬ 
sion in nature itself? 

Presumably, complexity in nature has evolved 
because it can bestow benefits on an organism. But, as 
with computers, complexity must not be a burden. An 
organism must be simple to build, simple to describe. 
After all, a mere picogram of DNA serves as the blue¬ 
print for animals weighing tons. Genetic economy 
demands that intricate structures be summarized simply. 
This struggle to simplify genetic requirements, deter¬ 
mines the geometric structure of the plant. Form follows 
genetic economics. 
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This suggests that a natural explanation as to why 
self-similarity abounds in the natural world: evolution 
has resolved the tension between complexity and simpli¬ 
city in the same way that computer scientist have — 
with recursive fractal algorithms. If fractal techniques 
help the computer resolve the demands of database 
amplification, then presumably organisms can benefit as 
well. For genes and computers alike, self-similarity is 
the key to thrifty use of data. 

The parameters of the tree program are numerical 
counterparts to the DNA code that describes a tree’s 
branching characteristics. The logic of the gene is mim¬ 
icked although the mechanism is different. The early 
stages of the model contained only 3 changeable parame¬ 
ters. The resulting images were of very simple fern like 
plants. New species were generated by controlling the 
parameter values, rerolling the dice of mutation and then 
selecting the forms that would be allowed to proliferate. 
As the model became more complex with the inclusion of 
more parameters, the program created images of more 
genetically complex trees such as cherry trees, higher on 
the evolutionary scale. Whereas natural selection of 
organisms is based on survival value, this aesthetic selec¬ 
tion is based upon resemblance to the the forms of 
nature. 

The actual computer program did not take very 
long to develop, just as it did not take long for plants to 
develop the ability to branch. Expanding the parameter 
space, and creating a diverse database, however, required 
turning the dials throughout all four seasons. This 
parameter space represents a more evolved instance of 
previous tree parmeterizations. 

0. CONCLUSION 

Of course any scientific model is simply an 
attempted translation of nature into some quantifiable 
form. The success of the model is measured by some 
qualifyablly predictive result. In experimental science, 
the success of a theory is measured by the degree to 
which the predicative model matches experimental data. 
Computer graphics now provides another style of predic¬ 
tive modeling. The success of a computer simulation is 
reflected in how well the image resembles the object 
being modeled. If the picture looks like a cherry tree, 
this suggests that the model is “correct” If one can 
model a complex object through simple rules, one has 
mastered the complexity. What appeared to be complex, 
proves to be primitive in the end. And the proof 
(although subjective) is in the picture. 

REFERENCES 


[1] Allen, R., Oppenheimer, P., The Palladium (Video), 
New York Institute of Technology, 1985 

[2] Aono, M., Kunii, T.L., Botanical Tree Image Gen¬ 
eration , IEEE Computer Graphics and Applications, Vol. 
4, No. 5, May 1982 

Graphics Interface ’86 


[3] Bentley, W.A., Humphreys, W.J., Snow Crystals , 
Dover Publications Inc.,New York, 1962 (Originally 
McGraw Hill, 1931) 

[4] Bloomental, J., Modeling the Mighty Maple , Com¬ 
puter Graphics, Vol. 19, No. 3, July 1985. 

[5] Cole, V.C., The Artistic Anatomy of Trees , Dover 
Publications Inc., New York, 1965 (Originally Seeley Ser¬ 
vice & Co, London, 1915) 

[6] Demko, S., Hodges, L., Naylor, B., Construction of 
Fractal Objects with Iterated Function Systems , Com¬ 
puter Graphics, Vol. 19, No. 3, July 1985. 

[7] Gardiner, G., Simulation of Natural Scenes Using 
Textured Quadric Surfaces, Computer Graphics, Vol. 18, 
No. 3, July 1984. 

[8] Kawaguchi, Y., A Morphological Study of the Form of 
Nature , Computer Graphics, Vol. 16, No. 3, July 1982. 

[9] Mandelbrot, B., Fractals: Form, Chance and Dimen¬ 
sion, W.H. Freeman and Co., San Francisco, 1977. 

[10] Mandelbrot, B., The Fractal Geometry of Nature , 
W.H. Freeman and Co., San Francisco, 1982. 

[11] Marshall, R., Wilson, R., Carlson, W., Procedural 
Models for Generating Three-Dimensional Terrain , Com¬ 
puter Graphics, Vol. 14, No. 3, July 1980. 

[12] Oppenheimer, P. Constructing an Atlas of Self Simi¬ 
lar Sets (thesis) Princeton University, 1979. 

[13] Oppenheimer, P. The Genesis Algorithm , The Sci¬ 
ences, Vol 25, No 5., 1985. 

[14] Reeves, W., Particle Systems—A Technique for 
Modeling a Class of Fuzzy Objects , Computer Graphics, 
Vol. 17, No. 3, July 1983. 

[15] Reynolds, C., Arch Fractal , Computer Graphics 
(Front Cover), Vol. 15, No. 3, August 1981. 

[16] Serafini, L., Codex Seraphinianus , Abbeville, New 
York, 1983. 

[17] Smith, A.R., Plants, Fractals, and Formal 
Languages , Computer Graphics, Vol. 18, No. 3, July 
1984. 

[18] Stevens, P.S., Patterns in Nature, Little, Brown, and 
Co. Boston, 1974 . 


Vision Interface ’86 



259 






Fallen Leaf - An exaggerated image of vein 
branching in a fall leaf. The external boundary 
shape is the limit of growth of the internal 
veins. (512x480) 


Raspberry Garden at Kyoto - The leaves 
and branches are generated by the fractal 
branching program. Berries are based on sym¬ 
metry models by Haresh Lalvani, with added 
random perturbations. Non-self similar features 
such as berries, require genetic specialization. 
(1024x960) 


Blossomtime - The bark of this cherry tree is 
made with bump mapped polygonal tubes. 
The blossoms are colored vectors (particle sys¬ 
tems) Instead of modeling blossoms, one can 
simply dip the branches in pink paint. 
(1024x960) 

Views II - By randomly perturbing the 
branching parameters, one generates a more 
naturalistic gnarled tree. (2048x1920) 
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Abstract 

In designing and constructing computer vision sys¬ 
tems. many crucial issues need to be addressed. Fore¬ 
most of these are the control and organization of the 
visual information processing tasks involved, and the rep¬ 
resentation and usage of both knowledge and data. As 
computer vision systems have evolved, growing in com¬ 
plexity and size, these issues have become increasingly 
important to their overall success. In this paper, a re¬ 
cent and increasingly popular approach to image under¬ 
standing. the knowledge-based system, is presented as 
a framework in which to deal with these issues. The 
engineering of a computer vision system as a knowledge- 
based system and these issues, in the context of our 
evolving system is discussed. 

Resume 

Lors de la conception et de la mise en oeuvre d un 
systeme de vision par ordinateur. plusieurs questions cri¬ 
tiques doivent etre considerees. Principalement, il s’agit 
du controle et de I’organisation des taches de traitement 
d’information visuelle ainsi que de la representation et de 
I'usage des donnees et des connaissances. Parce que les 
systemes de vision par ordinateur ont evolue' en grandeur 
et en complexity, leur succes depend de plus en plus de 
ces questions. Dans cet article, une approche nouvelle et 
de plus en plus populaire a la comprehension d images, 
le systeme base sur les connaissances. est presentee en 
tant que cadre de travail pour traiter ces questions. La 
realisation d un systeme de vision par ordinateur par le bi- 
ais d un systeme base sur les connaissances ainsi que ces 
questions sont traitees dans le contexte de notre systeme 
en evolution. 


1. Introduction 

A visual technology capable of replicating human vi¬ 
sion is the ultimate achievement for computer vision. To 
be able to accomplish such a feat would require a far 
superior understanding of the functioning of the human 


Graphics Interface ’86 


visual system. Moreover, this would require the embod¬ 
iment of intelligence in a machine. Undaunted by these 
severe limitations in understanding, computer vision has 
developed over the past twenty-five years in a somewhat 
ad hoc fashion. The growth of this infant technology 
in conjunction with its maternal science of artificial in¬ 
telligence has led to the emergence of computer vision 
systems. Albeit they are far from being general vision 
systems ' they are at present the best and only available 
artificial approximation. 

The earliest computer vision system, pioneered in the 
mid 1960’s by Roberts [Roberts65], was capable of ana¬ 
lyzing simple polyhedral scenes and matching the located 
polyhedra to stored models. Since then, computer vision 
systems have attained greater complexity due to the in¬ 
creasingly complex scenes being analyzed, as witnessed 
in the prominent systems of today. (See [Binford82] and 
[Shapiro83] for surveys on some of these systems.) In 
association with this increase in complexity, the control 
and organization of these systems have evolved from sim¬ 
ple sequential bottom-up or top-down mechanisms into 
complex structures involving many levels of cooperative 
processes, as the amount of knowledge required to reason 
about the analysis increases. As these complex visual 
information processing systems become more ambitious, 
it is clearly evident that the organization and control as¬ 
pects will also become increasingly more significant to 
their overall success. 

Control of vision systems have tended to be heavily 
embedded within the organization of the visual processes 
Such procedural methods are reliable and fast, but are 
very rigid in that they are application specific. Subject 
to variations in the goal description or the task domain, 
the appropriate alterations to the procedural knowledge 
may become a major task. Also, if the images to be ana¬ 
lyzed consist of complex structures and great intra-class 
variations, a sequence of analysis cannot be reliably pre¬ 
determined. Thus the analysis is necessarily data-driven, 
implying the need for a flexible and adaptive control struc- 

By general it is meant in the same sense as the human visual 

system, capable of multiple objectives in a dynamic, uncon¬ 
strained and complex visual environment 
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ture. 

This paper presents a recent and increasingly pop¬ 
ular approach to the organization of a computer vision 
system, permitting a greater degree of control flexibility 
and subsequently, functional generality. The paradigm 
presented is that of a knowledge-based system. 

2. The Knowledge-Based Approach 

A significant result in the first twenty years of arti¬ 
ficial intelligence research is the fact that the principal 
requirement for intelligence is knowledge. By the mid- 
1970’s Al began shifting from a power-based strategy 
towards a knowledge-based approach in an attempt to 
achieve intelligence. The power strategy looked towards 
a generalized increase in computational power in resolving 
the problems that the current techniques faced, whereas 
the knowledge strategy viewed progress being achieved 
from better ways of recognizing, representing and utiliz¬ 
ing diverse and specific forms of knowledge. The funda¬ 
mental problem of understanding intelligence is no longer 
the identification of power-based techniques, but rather 
a question of how to represent vast amounts of knowl¬ 
edge in a manner which permits their effective use and 
interaction. 

A powerful tool that has emerged from this shift of 
focus in Al is the knowledge-based system which is a 
problem solving system that applies knowledge about a 
specific domain to solve practical problems [Sowa84]. A 
class of knowledge-based systems known as expert sys¬ 
tems has recently received much attention [Waterman. 
Hayes-Roth&Lenat83] 

Knowledge-based systems have either adopted or de¬ 
veloped programming styles where there exists a clear 
distinction between knowledge and its use (for an in¬ 
troduction to and survey of a few of existing tools, see 
[Waterman.Hayes-Roth&Lenat83). pp. 169-215). This 
separation of control flow from its knowledge permits 
modular extensions to a system's capabilities. The know¬ 
ledge engineering tools that have emerged employ princi¬ 
ples of knowledge representation and.a related inference 
mechanism for bringing knowledge to bear on a prob¬ 
lem. Knowledge about the problem domain and self- 
knowledge are stored in a knowledge base using a repre¬ 
sentational framework. Current representational frame¬ 
works include rule-based, frame-based and logic-based 
schemes [Buchanan&Duda83]. Facts or data about the 
particular problem and processing are stored in a global 
database. The system retrieves pertinent knowledge to 
the problem and utilizes symbolic reasoning to make in¬ 
ferences about the facts in the global database to solve 
the problem at hand. 

Although one of the first domains of research in Al to 
incorporate knowledge was computer vision, the extent of 
improvement in this application has been slow and lim¬ 
ited. The application of knowledge has been restricted 
to domain specific knowledge of the scene analyzed in 


model-based vision. However, the use of world knowl¬ 
edge has been weak [Binford82]. There is now interest 
in the computer vision community to apply knowledge- 
based system techniques to improve this level of process¬ 
ing [Matsuyama 84.Nagao 82). 

As complex and large as current computer vision sys¬ 
tems are. they are very limited in their abilities [Bin- 
ford82.Matsuyama84). Much effort, of late has been di¬ 
rected towards improving and understanding specific vi¬ 
sion tasks, particularly, in low level vision [Brady82). A 
major emphasis in this work has been focused on the use 
of physical knowledge - knowledge about the physical 
world and the laws that govern it. Shape from shading 
and stereo vision, for example, use knowledge about the 
imaging process to recover 3D shape from projected 2D 
image features. More recently, another level of knowledge 
has been introduced in computer vision systems, percep¬ 
tual knowledge - knowledge used to group image features 
into aggregates. The basis for this knowledge comes 
from Gestalt laws of visual grouping [Zucker.Rosenfeld&- 
Davis75). Such knowledge has been successfully applied 
in refining low level segmentations [Nazif83] and form¬ 
ing perceptual groupings from 2D image features as the 
basis for 3D object recognition [Lowe84]. 

Apart from the need to improve every facet of the 
image analysis process, there is also a need to increase 
the overall intelligence of these image understanding sys¬ 
tems [Rosenfeld82. Matsuyama84). The capacity of in¬ 
telligence implies the ability to reason about the image 
analysis and the scene. Rosenfeld identifies a lack of a 
general theory of control in image analysis, i.e. there ex¬ 
ists no general principles describing how vision processes 
should interact in performing a particular task. He also 
identifies a lack of a general theory of how to combine 
evidence from multiple sources of information available 
in performing a particular task. Such general purpose 
knowledge is imperative if hopes of achieving a general 
vision system are to be satisfied. 

To achieve functional generality, a computer vision 
system must be capable of performing a variety of tasks. 
Upon specification of a particular task, the system must 
be able to determine the necessary processing modules, 
parameters and control strategy for performing the task. 
Given the requirement of being able to analyze a wide 
variety of complex images, this cannot be rigidly speci¬ 
fied a priori. The system should possess the ability to 
evaluate its performance at various stages of process¬ 
ing and be capable of adaptively improving it (whether it 
be by modifying parameters, modifying the control flow, 
integrating information, augmenting processes, or other 
mechanisms). Thus, it is necessary that the image being 
analyzed and its many abstractions dictate the processing 
flow and consequently, how the vision processes should 
interact. The control of the image analysis is therefore 
necessarily data-driven. This type of flexible control is 
easily realizable in a knowledge-based methodology. 

Ultimately, computer vision must address the im- 
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portant issue of integration of evidence from multiple 
sources, especially in view of the increased sophistication 
in applications and the need for improved performance. 
This is especially desirable since descriptions produced 
by computer vision techniques are incomplete and often 
imprecise, stemming from the inherent ambiguities that 
arise in an image. For example, consider the problem 
of image segmentation where partitions may be obtained 
from several measurable or extractable properties such as 
colour, luminance, texture or edges. In typical computer 
vision applications the "best" technique for segmenting 
the image, based on a single property, is often used to 
build an intermediate representation * for the higher level 
processes. This "best" technique is often arrived at by 
trying a set of techniques and deciding on the best. How¬ 
ever. it is necessary in a general system, where the "best" 
technique is not definable, to have a larger number of 
techniques available, and in some way be capable of inte¬ 
grating the results of these techniques into a "best" pos¬ 
sible usable intermediate representation. Integration of 
this nature can be viewed as a refinement process which 
operates on local extracted features. Nazif [Nazif83j has 
demonstrated the refinement of low level segmentations 
using a rule-based mechanism to represent processing 
knowledge for integrating information from a line-based 
and a region-based segmentation. Note that the integra¬ 
tion of information can also be useful in the refinement 
of the interpretation or recognition processes. 

Given the importance of knowledge in image analy¬ 
sis. the engineering of a computer vision system as a 
knowledge-based system is very appealing. However, to 
have successful systems, the knowledge levels (physical, 
perceptual, domain and processing) must be further en¬ 
hanced and the use of this knowledge be more effectively 
applied. Also, an appropriate knowledge engineering tool 
for vision applications must be formalized. 

3. Our System 

The aim of our system is to build a general purpose 
tool for experimenting with various approaches to image 
analysis. Constructing the system as a knowledge-based 
system permits us the flexibility to do so. In such a 
system where there is a distinct separation between its 
knowledge and the mechanisms that apply it. the task 
domain or its goals may be changed easily and as the 
system evolves, the modular extensibility of its capabil¬ 
ities by simply augmenting its knowledge is attractive. 
Equipped with a large set of visual processing algorithms 
and modules, by setting up the task domain and selecting 
the appropriate analysis strategy, this computer vision 

4 Intermediate representation is the general term used to describe 
the representations produced at various stages of processing 
between the signal (image) and the semantic (scene) levels 
For our purposes by intermediate representation, we mean the 
principal representation that is used bv the interpretation (high 
level) process 


system can attain a greater degree of functional general¬ 
ity and utility. Also, due to its data-driven nature, this 
system can be attentive to the processing requirements 
as dictated by the image, demonstrating the capacity of 
dynamic control [Levine&Nazif85b]. 

The basic computer vision system is identified as 
consisting of two major processing modules performing 
the low level or early processing and the high level or cog¬ 
nitive processing. Low level processing is concerned with 
extracting image features and structures to build an in¬ 
termediate representation. The principal task of the high 
level process is to match object models with structures 
described in the intermediate representation. Achieving 
object recognition or scene interpretation is the product 
of both of these levels of processing. A meta supervisor 
coordinates the interaction and flow of information and 
processing between both processes. This simple organi¬ 
zation is depicted in Figure 1. We follow the doctrine of 
separating the domain independent knowledge from the 
domain dependent knowledge in this form of dichotomy. 



Figure 1 Basic System Structure 


The organization of this system is presented in this 
fashion to express flexibility and generality which is per¬ 
mitted by the knowledge-based system paradigm. Al¬ 
though the interaction between the low level processor 
and the high level processor may be simply a one pass se¬ 
quential flow or be governed by a hypothesis-verification 
paradigm, this arrangement permits either explanation. 
The point is not to mask the control structure but to em¬ 
phasize that a knowledge-based approach permits greater 
flexibility in control. Changes in control strategy require 
only alterations in the meta control knowledge embedded 
in the meta supervisor or its usage as opposed to major 
reorganization necessary in a more conventional procedu¬ 
ral control structure. Conceptually, the low level and high 
level processors and their respective subtasks are viewed 
in the same manner. For example, the low level processor 
has its meta supervisor controlling its subprocesses and 
similarly these subprocesses have supervisors controlling 
their respective subprocesses. Each of these respective 
processes are themselves self-contained knowledge-based 
systems. Organizing the system in this fashion suggests 
a natural pyramid or tree hierarchy for the control of the 
entire system 
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4. Our Current Work 

A system of the nature described above is currently 
evolving at the Computer Vision and Robotics Labora¬ 
tory at McGill University. The knowledge representation 
framework chosen for the system implementation was 
a rule-based methodology and OPS5. a production sys¬ 
tem language [Forgy81. Dill&Hong84] was selected as 
the knowledge engineering tool. This latter choice was 
based primarily on availability. 

A low level processor based on Nazif s low level seg¬ 
mentation expert [Nazif83] has been implemented and is 
currently being tested. Extensions to the capabilities of 
this system are currently being implemented. Work will 
be initiated soon on the high level processor. 

The low level processor possesses the ability of non- 
purposive segmentation. A final partitioning of an im¬ 
age is obtained from the integration of initial region- and 
line-based segmentations. This integration is facilitated 
by the three knowledge sources which comprise the seg¬ 
mentation module: the line, region and area analyzers 
(see Figure 2). Each of these analyzers consists of rules 
which reason about the entities extracted from the image, 
i.e. lines, regions and areas of attention. These heuristics 
are domain independent, being based on the principles of 
visual grouping [Nazif83,Zucker.Rosenfeld&Davis75]. As 
well as the need for these heuristics to achieve the seg¬ 
mentation. some knowledge about how to apply them is 
also required. Hence the control problem. 



Figure 2 The Segmentation Module 


Control is effected by dynamically setting strategies 
for the processing of areas, regions and lines. The selec¬ 
tion of the strategies is based on a fuzzy concept of a re¬ 
gion's or line's “need for further processing". A measure 
of this fuzzy notion is discernable from a set of perfor¬ 
mance parameters [Levine&l\lazif85a.Nazif83] reflecting 
the quality of the segmentation at that instant in pro¬ 
cessing. Such a control strategy is very appealing in that 
it attends to the needs of the current segmentation and 
also by nature is domain independent. 

The resulting intermediate description obtained from 
this segmentation module is a region-based representa¬ 
tion of the image. However, the low level processor that 
is envisioned would combine many functional modules to 
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provide a rich intermediate representation of the scene, 
of which the segmentation module is one (see Figure 3). 
A second module now being implemented, which tran¬ 
scends the picture domain, is concerned with the extrac¬ 
tion of scene domain cues. Such three-dimensional cues 
as occlusion, cast shadows, and skewness, extractable 
from the image contour, gives rise to some depth and 
orientation information. Exploiting this information, the 
shapes of objects may be inferred. This would yield 
an object-based segmentation of the scene. Similar to 
the segmentation module, the resulting partition of the 
scene would be obtained from the integration of the re¬ 
fined. region-based segmentation and this initial object- 
based segmentation. With the addition of other modules 
(perhaps a segmentation based on texture or a surface 
recovery module based on laser vision), the required in¬ 
tegration would certainly be of greater complexity. 



Figure 3 The Low Level Processor 

The described low level processor, a general purpose 
subsystem by design, is oblivious to the task domain. 
It is in the high level processor where interaction with 
world knowledge is a necessity to achieve recognition or 
interpretation tasks. To accomplish this, the high level 
processor must possess the ability of matching object 
models to the intermediate representation supplied by the 
low level processor. More specifically, it must be able to 
resolve ambiguity (which is inherent in both the image 
data and world knowledge) and to identify instances of 
the object models by examining the consistency amoung 
local image features. 

Some common paradigms that have been employed in 
image analysis include constraint propagation, template 
matching and hypothesis-verification [Matsuyama84. Bin- 
ford82]. In these methods, initially some match or infer¬ 
ence is made of image features to object models. Then 
these initial inferences are verified for local consistency 
whether in a sequential manner as is the case for tem¬ 
plate matching and hypothesis-verification, or in parallel 
for constraint propagation. Local consistency at some 
level is sufficient for object recognition tasks, but for in¬ 
terpretation. the inferences must be propagated to attain 
global consistency. These paradigms may be viewed as 
consisting of two characteristic mechanisms, one to make 
the initial inferences or matches and the other to propa¬ 
gate them (see Figure 4) 
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Figure 4 The High Level Processor 

The objective of this high level processor is to achieve 
a scene description or object recognition given an object- 
based intermediate representation. But because the high 
level process is inherently limited by the quality of low 
level segmentations, ambiguity may not be easily resolved. 
Therefore, the high level process should have the ability 
to integrate evidence from other intermediate represen¬ 
tations (region-based, line-based, etc.) in the inference 
forming and propagation processes. As a final recourse 
in the face of unresolvable ambiguity, the high level pro¬ 
cess should be able to request that the low level process 
either further refine or re-construct, a part or the whole 
of the intermediate description. 

Work is now being initiated on the development of 
such a high level processor. 

5. Discussion 

Though the construction of a computer vision system 
as a knowledge-based system is very attractive, problems 
do however present themselves. They stem from the limi¬ 
tations and deficiences of the representational framework, 
the knowledge engineering tool, its data-driven nature 
and knowledge itself. These shortcomings are not unique 
to this application, they are apparent in knowledge-based 
systems in general. 

A major part of the effort in building a knowledge- 
based system is the identification and acquisition of per¬ 
tinent knowledge applicable to the problem. Such knowl¬ 
edge is limited in its scope, incomplete and inexact, be¬ 
cause we lack complete laws and theories about the prob¬ 
lem This is representative of the various knowledge lev 
els (physical, perceptual, domain and processing) present 
in computer vision systems. Often the knowledge is ill- 
specified because it is not clear what exactly is known 
about the problem or how to apply it. To improve the 
performance of knowledge-based visual information pro¬ 
cessing systems a greater amount of knowledge must be 
identified and applied to the problem. Unlike the domain 
of expert systems, where there exist experts from which 
knowledge is accessible through interaction, knowledge 
useful to computer vision systems must be determined 
from the slow process of understanding human vision. 

Control flow in a computer vision system such as ours 
is governed by the data, but this data is often unreliable 


and incomplete. As a consequence, such a system could 
easily run astray. Coupled with ill-specified knowledge, 
the possibility is even greater. To cope with this problem, 
either the integrity of the data must be substantiated in 
some manner, by for example, incorporating redundancy 
(confirmation or combination of evidence from multiple 
sources) or the ability to reason with uncertainty must 
be established. 

Although OPS5 is a general purpose production sys¬ 
tem programming language, our experience has shown 
that as a tool for constructing computer vision systems 
it suffers from several deficiencies. The principal one is 
that is inadequate for representing the diverse knowledge 
and data that must be embodied. The predominant na¬ 
ture of knowledge that must be encapsulated, especially 
at the low level is procedural; that is. it prescribes a set 
of operations. However. OPS5 does not facilitate proce¬ 
dural mechanisms nor complex computations on the right 
hand side of a rule. To capture a “chunk” of knowledge 
often requires the chaining of a set of productions. As 
well, there exist no generic control mechanisms that per¬ 
mit the accessing of a set of data in an orderly fashion, 
that is. the application of a rule (or a set of rules) se¬ 
quentially on a set of data. Nevertheless, it is actually 
possible to accomplish this, but it requires the construc¬ 
tion of specific control rules and the generation of control 
state data to ensure the proper processing flow. Finally, 
the data representation capabilities of OPS5 do not facili¬ 
tate the representation of the lowest forms of visual data. 
There are no data structures for maintaining images, nor 
are there constructs to manipulate them. 

These inadequacies and others using this knowledge 
engineering tool, though not insurmountable, suggest that 
perhaps some of our future work should be directed to¬ 
wards developing a more suitable knowledge engineer¬ 
ing tool for constructing knowledge-based computer vi¬ 
sion systems. An adequate tool would make the system 
more efficient and manageable. However, the specifica¬ 
tion of such a tool would require one to first identify the 
requirements necessary for building a knowledge-based 
computer vision system. 

The rule-based methodology is a very general and 
flexible framework for representing knowledge and data, 
as is evident by its prevalant use in expert systems, cover¬ 
ing a wide scope of problem domains. Even so. it is found 
to be not entirely adequate for our purposes. Subject to 
the nature of certain representations and processing re¬ 
quirements in our system, our experiences with OPS5 as 
discussed above, have shown that a classic pure produc¬ 
tion system model has its deficfencies. This suggests 
that a purely rule-based representational framework is 
not appropriate. A blend of the rule-based model and the 
imperative model would be more suitable. 

The work that we have described here is only in its 
formative stages. Though we cannot yet conceive of all 
the many problems that will face us. we are however 
beginning to understand some of the major issues in- 
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volved in attempting to build such a massive system. 
This knowledge will become invaluable in the future evo¬ 
lution of this system. 
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Integration of Remotely Sensed Data and 
Geographic Information Systems 

by 

David G. Goodenough 


Abstract 

Canada is heavily dependent upon the effective utili¬ 
zation of its resources. To better manage the nation's 
resources, resource managers are increasingly turning to 
computer-based technologies. Two particularly important 
technologies for resource information management systems are 
remote sensing and geographic information systems (GIS). 
Operational resource managers are using the geographic 
information systems to store digital representations of their 
resource maps. Associated with these graphical digital maps 
are databases containing the attributes of map features. 

The major (in terms of contribution to the GNP) land- 
based renewable resources are forestry and agriculture. In 
agriculture, the resource cover changes annually and should 
be monitored frequently during the growing season. For 
forestry, the changes are slower, but some provinces require 
annual updates of their forest inventories. Geographic 
information systems for forestry are loaded by manually 
digitizing existing forest cover maps. Approximately 6,000 
1:20,000 scale maps are required to cover British Columbia, 
for example. Traditionally, the updating of maps has 
required re-flying the area of interest. This is a costly 
procedure. The aerial photos do provide, however, high 
resolution imagery. Cost savings in geographic information 
system updating can be achieved by incorporating remote 
sensing data from satellites. 

Since 1978, we have conducted experiments to integrate 
remote sensing data from satellites and aircraft. We have 
also investigated the integration of these data with geo¬ 
graphic information systems. The experiments have shown that 
this integration is difficult for several reasons. The 
remote sensing data may not have sufficient spatial resolu¬ 
tion to show the features of interest. The remote sensing 
data is geometrical1y corrected in Canada using federal NTS 
maps, usually 1:50,000 scale. There are substantial (up to 
200m) geometric errors between the provincial maps of some 
provinces and the federal maps. Labelling of some ground 
features in the GIS may be inconsistent. As a result, we 
have concluded that artificial intelligence techniques are 
required to handle the combinatorial explosion of problems 
reducing the effectiveness of integrating remote sensing data 
and GIS. 

In this presentation, we review the efforts in inte¬ 
grating remote sensing data and GIS and present the approach 
at the Canada Centre for Remote Sensing. A brief discussion 
of the problem of exchanging data amongst geographic informa¬ 
tion systems will also be addressed. 
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Image segmentation based on color and texture gradient 


Phu Thien Nguyen 
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36 Ave Raymond Poincare, Paris 75116 


ABTRACT 

An image segmentation scheme based on a 
model of human perception of color and texture 
is proposed. It consists of the following steps: 

• Spatial segmentation of each RGB images 
using an edge detector, contour following 
and closing algorithms; 

• Characterization: each intersection of the 
segments is now characterized by: 

■ color: statistical parameters of the RGB 
values of pixels within the intersection; 

■ shape: perimeter, area, compactness; 

■ orientation; 

■ topology: neighbor, inclusion. 

• Merging of the elementary segments based 
on the selection of their attributes. 

• Segmentation by texture gradient based on 
the window correlation technique to identify 
"coherent texture" regions. 


RESUME 

Nous proposons une methode de segmentation 
damage basee sur un modele de vision de la 
couleur et de la texture qui consiste en les 
etapes suivantes: 

• La segmentation spatiale des canaux Rouge, 
Vert et Blue par detection des contours avec 
un operateur de gradient, le suivi et la 
fermeture des contousr; 

• La caracterisation de chaque intersection 
des segments par: 

■ couleur: parametres statistiques des 
valeurs des pixels a l'interieur d'une 
meme intersection; 

■ forme: perimetre, surface, compacite; 

■ orientation; 

■ topologie: voisinage, inclusion. 

• L'agregation des segment elementaires par 
la selection de leurs attributs. 

• La segmentation de l'image par le gradient 
de texture en utilisant la correlation des 
fenetres sur l'image pour identifier des 
regions avec des "textures coherentes". 


INTRODUCTION 

The understanding of the human visual system 
and the photo-interpretation method adopted 
by human analyst does help in designing algo¬ 
rithms in image analysis. The approach is not 
so much to simulate or worst to imitate the hu¬ 
man visual system but rather to understand the 
underlying mechanisms in order to deduce a few 
general criteria which could be implemented to 
extract pertinent features and to identify objects 
of different visual properties. 

Different objects on images can be identified by 
either labelling pixels of similar properties or 
defining their boundaries. 

The methods of region growing, split and merge, 
clustering are typical of the first approach. The 
second approach consists of methods which are 
based on the detection of contour. Most exist¬ 
ing method of contour detection are based on 
the gradient in the gray level of a half tone im¬ 
age. We are proposing a method of image seg¬ 
mentation which is based on the integrated use 
of the three fundamental properties: color tex¬ 
ture and geometrical properties of elementary 
segments on the image. Two methods of con¬ 
tour detection are developed one using a general 
Sobel operator to detect gradient in gray level 
and other using a correlation operator to detect 
texture gradient. These contours are followed 
to define elementary segments. The elementary 
segments are then aggregated using a clustering 
method. Elementary segments are then merged 
based on theirs geometrical and topological 
properties. 

The method is applied to the SPOT simulation 
data taken over an area in the Southwest of 
Paris. 
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COLOR PERCEPTION 

The trichromatic model of color vision 
(Faugeras 1976, Pratt 1978, Caelli 1981) is 
based on the differential absorption spectra of 
the three pigments found in the cones of the 
retina. The energy absorbed by the three types 
of cone are: 

L(x,y) = J/(A'ja)/(X)<& 

M(x,y) = f/(* X)/w(A.)<& 

S(x,y) - j/(x*y, X)s(k)dk 

where: 

I: intensity of energy at a point (x,y) 

m , 

m(\) I : spectral absorption curve 
s(X) 

X : wave length ( 380 to 800 nM) 

L, M and S can be correlated with the primary 
colors Red, Green and Blue by: 


" L 


R 

M 

= U 

G 

S 


B 


where U is given by CIE (Comission 
Internationale de l'Eclairage) for the case of TV 
set phosphorous by: 

.3098 .6321 .5818 
U = .1208 .7665 .1127 
.0042 .1550 .8408 

The response of the cones seems to be non¬ 
linear and L, M and S are transformed into L', 
M'and S': 

L' (x,y) = Log ( L(x,y)) 

M' (x,y) = Log ( M(x,y)) 

S' (x,y) = Log ( S(x,y)) 

A number of experiments seem to suggest that 
in the visual system, these responses are com¬ 
bined and processed as separate achromatic (A) 
and chromatic information (Cl, C2) which can 
be expressed mathematically as: 

A = a(aL f + pM' + yS') 

Cl = pl(L' - M’) 

C2 = p2(L' - 5') 

The coefficients are given in detail by Faugeras 
(1976). 

A is usually associated with the brightness of 
the image. 
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SPATIAL VISION 

The visual signals received by retinal receptors 
are then transmitted to higher level visual cells 
or group of cells in the Lateral Geniculate Nu¬ 
cleus, and in the visual cortex. 

The visual information is processed in a hierar¬ 
chy depending on the spatial and spectral re¬ 
sponses of these visual cells in the receptive 
fields at different levels. The phenomenon is 
not yet fully understood, but the following 
findings (Nevada 1982, Caelli 1982, Pratt 1976, 
Tsotsos 1982, Zucker 1984, Cretez 1984) give 
some insights to how this process is done: 

• In the retina, the photo-receptors are 
grouped in ganglion cells organized into 
groups of receptive fields. The number of 
receptors in the ganglion cells is smallest 
near the fovea and increases nearer to the 
periphery. This explains the highest visual 
acuity at the fovea. The spatial arrangement 
of these receptive fields is suggested to be 
concentric, hexagonal and organized into a 
number of layers. The lateral inhibition ef¬ 
fect between cells in the receptive field ex¬ 
plains the perception of contrast from the 
information given by the ratio of the re¬ 
sponse of the central cells to peripheral cells. 
This contrast sensitivity is a function of 
spatial frequency of the viewed objects. 

• Chromatic response is higher at a lower 
spatial frequency than achromatic response. 

• Chromatic response is relative and the well- 
known chromatic adaptation effect explains 
the ability to adapt to local condition of 
lighting and color balance. 

• The receptive fields in the cortex are 
elongated and hence the response to visual 
signal at this level is dependent on both 
spatial frequency and orientation. 

® Visual information processing is most pre¬ 
cise near the fovea, and the analysis of a 
scene will require a specific pattern of foveal 
movement. The analysis of this pattern 
gives valuable information on how different 
objects in the scene are being identified. 


SEGMENTATION FROM COLOR 


An algorithm was developed to detect contours 
having a strong local gradient (Asfar 1981). The 
process consists of: 

• Detection of edges using a generalized Sobel 
operator, 

• Selection of points having a local maximum 
gradient, 
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• Search for the nearest neighbors of each 
point, taking into account the gradient di¬ 
rection, 

• Construction of the contours from the 
neighbor image. 

The detected contours can be followed and 
closed to isolate different elementary segments. 
Each segment is then labelled and their 
attributs computed. 

The spatial segmentation is done separately for 
each of the primary colors Red, Green and 
Blue. From these segmented images an image 
of intersection is created where each intersection 
is given the average R, G and B values of all 
pixels contained within the intersection. The 
spatial segmentation of the SPOT data (fig 1) is 
shown in fig 3. The segmentation can also be 
done on the L, M and S or A, Cl and C2 com¬ 
ponents. The R,G and B components are cho¬ 
sen since it was proved that no significant 
difference is observed when using different color 
feature sets (Ohta, 1985). 


Pixels are assigned to the nearest cluster and its 
center updated according to: 



It* n k + 1 


where n k is the population of cluster k 


The clusters found can be: 

• split if their standard deviation is below a 
certain threshold. 


• merged if their distance: 



Mk) - p/fc )] 2 

a,W.a//c) 


is below a certain value. 


A clustering result of the original SPOT data is 
shown in fig 2 as compared with the result ob¬ 
tained with the segmented images (fig 4). 


SEGMENTATION 
BY TEXTURE GRADIENT 


The segmentation by texture is treated sepa¬ 
rately and a texture gradient operator is pro¬ 
posed based on the window correlation 
technique. The analysis is restricted to chro¬ 
matic component A (fig 5) because the eye is 
more sensitive to spatial resolution. 

At the boundary of two areas of different tex¬ 
tures, the correlation between two windows is 
maximum along the boundary and minimum 
across. By calculating the correlation of a cen¬ 
tral window to eight neighboring windows, the 
direction and the amplitude (difference between 
the maximum and minimum correlation coeffi¬ 
cient) of the "texture gradient" can be derived. 
From these two parameters contours can be 
traced and followed to isolate "coherent texture" 
regions. 


CLUSTERING 


An unsupervised classification based on the LI 
distance (Ramirez 1982): 

N band 

4 = , 1 , I m/ 0 - *(0l 


where : 


^ is the vector of the center of cluster j 
j = 1, 2,... N c is the number of clusters. 


GEOMETRICAL FEATURE EXTRACTION 

The segmented image from either the segmen¬ 
tation by color or texture gradient can be stored 
under Run Length Coding format ( Loodts 
1985). The following attributes can then be 
computed: 

Perimeter: The perimeter is defined as the 
number of pixels belonging to the border of 
each surface. This is done by accumulating the 
number of pixels of the same surface sur¬ 
rounded by pixels belonging to other surfaces in 
two consecutive lines. 

Area: this is done by accumulating the length 
values of pixels belonging to the same surface in 
two consecutive lines 

Compactness: this is a measure given by the 

A 

parameter where P is the perimeter and 
A the area of each segment. 

Topological features: the relation between seg¬ 
ments such as neighborhood, inclusion etc. can 
be determined by considering pixels belonging 
to their common boundaries. 


ELEMENTARY SEGMENT MERGING 

Each elementary segments after the segmenta¬ 
tion, clustering and feature extraction are now 
characterized by a set of parameters: 

• color: statistical parameters of the RGB 
values of pixels within the intersection; 

• shape: perimeter, area, compactness, linear¬ 
ity; 

• orientation; 
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© topology: neighbor, inclusion, 

The merging can be done by applying a decision 
based on the selection of these features. 

An example of elementary segment merging on 
the texture classified image (fig 6) by the fol¬ 
lowing criterion: 

"Small segments (surface < threshold) included 
in a large segment ( surfac surface > threshold) 
are merged with the large segment if their forms 

A 

are not compact (-^- is small )" . 

The result of this merging is shown in fig 7. 


CONCLUSIONS 

The image segmentation as proposed integrates 
the spectral and spatial attributes of each pixels 
in the identification of each elementary seg¬ 
ments on a color image. Furthermore these el¬ 
ementary segments can be aggregated by an 
intelligent selection of their attributs. This pre¬ 
sents an approach nearer to the method of 
photo-interpretation, but there still exists a 
number of drawbacks which can only achieved 
by further researches in the following points: 

• The segmentation process is based on global 
features and local environment can not be 
taken into account to cater for: 

■ local chromatic adaptation; 

■ size invariance : a fixed window cannot 
be used over all regions of wide spectral 
frequency variation; 

■ orientation invariance: 

• definition of texture gradient by correlation 
is still a very crude approximation; 

• the segmentation of an image into regions 
of different color texture with consistent 
pattern of primitives having a similar color 
still remains to be investigated. A combined 
use of texture and color information by av¬ 
eraging the R , G and B segmented images 
by using the texture class map as a mask is 
presented in fig 8, this image shows the col¬ 
ors of regions of "similar degree of 
homogeinity". 

• the herachical approach which can locally 
adapted to the complexity of different re¬ 
gions on the image. 
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Fig. 1. Color image of 

the Red, Green and Blue components. 


Fig. 2. Classified image of fig 1 by clustering. 



Fig. 3. Color image of the intersection 
of the three segmented images. 


Fig. 4. Classified image of fig 3 by clustering. 
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Fig. 5. Achromatic component 

of the Red, Green and Blue images. 


Fig. 6. Classified image of the segmented image 
by texture gradient after clustering. 




Fig. 7. The result of elementary 

segment merging of fig 6. 


Fig. 8. The result of selective smoothing 
of fig 3 using fig 7 as a mask. 
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ABSTRACT 

A Knowledge Based System (KBS) for analyzing 
LANDSAT MSS images and comparing this analysis to 
corresponding geocartographic data is presented. 
This paper discusses the preprocessing requirements 
for the LANDSAT and the geocartographic data for a 
uniform representation of the data. The segmenta¬ 
tion of the LANDSAT data and the interpretation of 
the segments are presented. The preprocessed data 
are read into the Map/Image Congruency Evaluation 
(MICE) KBS where the image segments are classified 
and then compared with the map data, based on 
class, segment size, shape, and location. Results 
of the map/image congruency analysis are output and 
converted to image form. This paper presents the 
MICE KBS and reviews the results generated for a 
LANDSAT MSS scene of the Prince George area of 
British Columbia. 

KEYWORDS: LANDSAT, computer cartography, image 
analysis, artificial intelligence, 
knowledge based systems 

SUMMARY 

Remotely sensed data, particularly from the 
LANDSAT series of satellites, are being used for a 
wide variety of useful applications. One of the 
more challenging applications is the data 
integration of remote sensing data with existing 
cartographic data bases. It was found that simple 
algorithmic data integration methods did not 
provide satisfactory results due to various geome¬ 
tric irregularities in the remote sensing data and 
in the cartographic data. These spatial irregular¬ 
ities could be due to factors such as temporal 
differences between the data, spatial errors in the 
map data or topographic effects in the remote 
sensing data. Algorithmic techniques break down in 
this data integration [BILLINGSLEY82]. Therefore, 
we have tried to solve the integration problem with 
a knowledge based system approach. 

The Map/Image Congruency Evaluation (MICE) 
knowledge based system was developed to study the 
spatial differences between maps and images. Map/- 
image congruency evaluation means the determination 
of the spatial agreement of features in the map 
with the corresponding feature in the image. The 
data integration problem study included three basic 
operations. These operations were: (i) prepro¬ 
cessing the data for uniform representation of both 
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the image and the map data; (ii) spatial reasoning 
on the data using the PROLOG-based MICE system; 
(iii) output of a congruency evaluation map from 
the results of the MICE analysis. 

The MICE system was evaluated using LANDSAT 
MSS data for the Prince George area of British 
Columbia and a BC provincial forest cover map. 
Various levels from the BC digital map were 
selected. These levels corresponded to single-line 
creeks and rivers; double-line rivers and lakes; 
road and utility systems; and the forest cover. 
Each level was gridded to a 50x50 metre grid. The 
LANDSAT data were geocoded by the Canada Centre for 
Remote Sensing (CCRS) Digital Image Correction 
System (DICS) to a UTM coordinate grid with 50x50 
metre pixels. The sub-area of the image correspon¬ 
ding to the map was selected. 

The LANDSAT image was then segmented to high¬ 
light the various features. Numerous properties 
such as the segment shape, size, location and 
spectral means were evaluated. The map data were 
similarly processed to determine properties such as 
shape, size and location. These data were then 
read into the knowledge based MICE system. 

The MICE system then evaluated the matching of 
the various segments from the map and image by 
examining the identification of the segments and 
the structure of the segments. The identification 
of the segments was done to determine if the 
structurally corresponding segments have correspon¬ 
ding identifications. For instance, a segment that 
has been identified as a lake in the map data must 
correspond to a segment in the image that has a 
spectral signature that corresponds to a lake. If 
the LANDSAT segment does not have a corresponding 
spectral signature, then the segment is only weakly 
identified. Finally, the exact positions of the 
remaining segments are determined and all location 
differences are reported. 

The MICE system, which is currently under 
development, uses a variety of meta-level rules and 
object-level rules. These rules and some of the 
internal workings of the knowledge based system 
will be given, as well as suggested enhancements. 

INTRODUCTION 

For many years, human photo-interpretors have 
been analyzing air-photos, deciding on the classi- 
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fication of various objects in the photo and then 
transcribing the classification and location of 
these objects onto a map or more recently into a 
geographic information system [ZARZYKI82]. Since 
this map making procedure is primarily a human 
endeavour, it is prone to human error. In 

addition, the world land-mass is a constantly 
changing entity. For example, rivers meander 
further, forests burn or are cut, and subdivisions 
and roads are built. The cartographic data on the 
other hand, is relatively static and is only 
updated periodically. 

For some time, the remote sensing community 
has been extoling the virtues of the integration of 
remote sensing data with Geographic Information 

Systems (GIS). This data integration problem has 

been researched and solutions developed, which are 
used operationally by some agencies [HEGYI83]. 
However, the automatic integration of remote 
sensing data with geographic information systems is 
not yet possible as it still requires human inter¬ 
pretation and assistance. 

One of the first steps in the integration of 
remote sensing data with GIS data is simply to 

evaluate how similar or different are the map and 
image data. It has been shown that algorithmic 
techniques such as differencing and correlations 
simply don't work very well [PARS0NS84] 
[G00DEN0UGH85]. Also many rule based systems for 
image interpretation have shown promising results 
[MCKE0WN85]. Thus, a knowledge based system for 
the comparison or congruency evaluation of maps and 
images was developed. 

The MICE system was developed on a VAX 11/780 
system running VMS. The VAX system hosts a suite 
of software from Intergraph Corp. for processing 
and manipulating cartographic data. Also, the VAX 
hosts a large suite of image processing software, 
that was developed in VMS Fortran at CCRS. It was 
decided that the existing software base be used for 
some of the processing programs. Additional pro¬ 
cessing and reformatting programs were developed in 
Fortran and the results fed into the MICE KBS. MICE 
itself, was written in M-PR0L0G from Logicware Inc. 

The Prince George area of British Columbia was 
selected as a test area as a number of data sets 
from a variety of sources were available. The 
digital cartographic data from the BC Ministry of 
Forests was obtained. These data contained the 
hydrography, cadastral, forest, roads, railroads 
and other cartographic information required for a 
forest cover map. The map scale was 1:20,000 and 
corresponded to the UTM map number 93G096. 

The LANDSAT MSS geocoded image for the Prince 
George area (93G15) was obtained from CCRS. This 
DICS product [GUERTIN81] consists of the LANDSAT 
data scaled and projected onto a 50 metre grid, in 
a UTM projection. 

GIS PREPROCESSING 

The BC forest cover map was received and 
stored as an Intergraph design file. This file 
contained a variety of cartographic information, 
but for the purposes of this experiment, only the 
following information was processed: 
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Information File Level 


a) single-line creeks, rivers 5 

b) double-line rivers, lakes 6 

c) utility systems 8 

d) forest cover typelines 9 


The levels were extracted and any spurious 
information or text was deleted. Each level was 
edited using an automated technique for ensuring 
that all line intersections were cartographically 
sound. Next, the levels were individually conver¬ 
ted from vector format to grid format, based on a 
presence/absence algorithm onto a 50x50 metre grid. 
These grid files were then converted to CCRS 
standard imagery files. Each polygon (such as a 
lake), which was not fully filled was filled. The 
image was precision registered to a UTM grid and 
each entity of the map was identified uniquely and 
its location was run length encoded. Finally, each 
unique element, along with its run length encoded 
location was converted into symbolic object form. 
The file containing these symbolic objects were 
read into the MICE system. The procedure for 
preprocessing the cartographic data is given in 
Figure 1. 

IMAGE PREPROCESSING 

The LANDSAT MSS image (frame Id: 50458-18360) 
used for this experiment was imaged on June 25, 
1985. The MSS image was then precision geocoded on 
the CCRS DICS system. The area of the image 
corresponding to the BC forest cover map 93G096 was 
extracted and precision registered to the 

rasterized 93G096 map data. The MICE system then 
requires the segmentation and statistical analysis 
parameters of these segments. 

The image subscene is first operated on by a 
specified gradient operation. The resulting file 
is then segmented [B0ILEAU85]. Each segment is 
uniquely identified and its location run length 
encoded. Next statistical information on each 

segment is generated. This statistical information 
is shown in Table 1. The segment locations and 
segment statistical values are converted into 
symbolic object format for input to the MICE KBS. 
The procedure for preprocessing the image data is 
given in Figure 2. 

MAP/IMAGE CONGRUENCY EVALUATION KNOWLEDGE BASED 

SYSTEM 

The map/image congruency evaluation knowledge 
based system is implemented in PROLOG using a shell 
for developing hierarchical expert systems for 
remote sensing [G0LDBERG85] [BRAUN85]. The imple¬ 
mentation is primarily divided into two rule types. 
These are the meta-rules, which are rules about 
what MICE should do next, based on information 
deduced to that point. The other type of rule is 
the object rule, which is a rule that has been 
input to the MICE system, or has been deduced by 
the MICE system. 

Meta-Level Rules : 

The meta-rule consists of four items. These 
items are: 1) condition predicate; 2) action 
procedure; 3) phase number; 4) rule number. 
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FIGURE Is 

THE PROCEDURE FOR PREPROCESSING THE 
CARTOGRAPHIC INFORMATION 


FIGURE 2s 

THE PROCEDURE FOR PREPROCESSING 
LANDSAT MSS IMAGE 
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The condition predicate (or "if" part of the 
rule) is evaluated by MICE to determine if the 
condition predicate is true. The action procedure 
(or "then" part of the rule) may then be executed 
if the condition predicate is true. The phase 
number is the strategy level within the meta-level 
procedure in which this rule is to be evaluated. 
The rule number is simply to uniquely identify each 
rule in the meta-level procedure. An example of 
two meta-level rules for one phase from MICE is as 
follows: 

if: the image segments were identified (ok) 

then: compare map and image segment sizes 

phase: 9 
rule #: 17 

if: not (the image segments were identified 

(ok)) 

then: write the string ("no image segments 

identified") 
phase: 9 
rule #: 18 

Object Rules : 

Object rules also consist of four items. These 
are: 1) condition predicate; 2) action procedure; 

3) rule number; 4) certainty factor. These rules 
deduce object values based on the values of the 
objects in the condition predicate. The rule 
number uniquely identifies the rule number and the 
certainty factor is a value from 0 to 100. 

Objects : 

Objects are the basic manipulation element of 
MICE upon which deductions are made. Objects 
consist of four values, which are: 1) object 
context or description; 2) object attribute; 3) 
object value; 4) measures of belief and disbelief. 
MICE uses context values such as: source (image- 
MSS), source (map-bcfs), segment (segment-number) 
and class (class-name). Attribute values such as 
location, size, mean-channel, and shape etc. are 
used with the corresponding value of the attribute 
in the object element. The measures of belief and 
disbelief for each element are included. The 
measures of belief and disbelief range from 0 to 
100. A be!ief/disbelief value of 100 means that 
this object is very believable/disbelievable. A 
smaller value indicates less belief/disbelief in 
this object. A sample object element is as follows 
(in PROLOG notation): 

obj([[*, source (image-MSS), *, segment (2), *, 
class (hydrography), * ], size, [2160], [75,25 ]]). 

This Prolog statement means: 

1) The source of the segment is the LANDSAT MSS 
image. 

2) The segment is segment number 2. 

3) The segment has been classified into the class 
hydrography. 

4) The attribute of this object element is the size 
of the segment. 

5) The attribute value or the size of the segment 
is 2160 pixels. 

6) The measure of belief for this object is 75 and 
the measure of disbelief is 25. 

A simplification of the agenda that MICE uses 
to perform the congruency evaluation is as follows: 
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1) load the map object elements 

2) load the image object elements 

3) perform preliminary classification on image 
objects 

4) get the next map segment 

5) find all image segments near map segment 
(focus) 

6) compare class values 

7) compare segment sizes 

8) compare segment shapes 

9) compare segment locations 

10) output results 

The output generated by MICE is in the form of 
object elements, that were deduced by the KBS. 
These object elements indicate where the map and 
image segments overlap, where they are partitioned, 
where they are hierarchical and where they are 
bipolar. These elements are then converted from 
symbolic form to run length encoded format. 
Finally the results are converted to imagery 
format, which can be displayed and reviewed. 

SAMPLE OUTPUT 

An experiment using the MICE KBS was performed 
using LANDSAT MSS and digital map data from the 
BC Ministry of Forests. The sample outputs are for 
the double-line rivers and lakes data from the BC 
map. The figures 3 to 10 show the input and the 
output from various phases within the MICE system. 
They all correspond to the processing of the map or 
image data from figure 3 or figure 5, respectively. 

Figure 3 shows the input LANDSAT MSS image for 
band 7 (infrared 0.8pm to 1.1pm) for the Prince 
George area of BC. The image has been geocoded to 
the UTM projection and resampled to 50m x 50m 
pixels. The date of the image is June 2, 1985. 
Figure 4 shows the same image following the 
application of the Sobel gradient operator, segmen¬ 
tation, and grey level coding. The coding 
algorithm is for display purposes only. It reviews 
the segments and assigns each segment a digital 
value that ensures that no neighbouring segment has 
the same value (grey-level). However, non¬ 
neighbouring segments may have the same grey-level 
value. Figure 5 shows the input map vector data 
for the double-line hydrology level of a BC forest 
cover base map. Figure 6 shows the same map data 
after it was cleaned and rasterized. Cleaning 
means removing annotation, processing vector and 
points (overshoot/undershoot conditions), process¬ 
ing vectors against themselves (knot conditions), 
and processing vectors against other vectors (lobe 
conditions). Rasterization uses the presence/- 
absence algorithm. 

Figure 7 shows the segments of the image that 
were identified as being in a map segment window 
(focused) and were also classified as hydrology. A 
focused image segment means that the image segment 
is "near" the map segment. The map segment window 
is the smallest rectangle that encloses the seg¬ 
ment. An image segment is near (focused on) the 
map segment if any part of it is within the window 
or half the window length in any direction. 

Figure 8 shows the focused image segments that 
are of similar size. Figure 9 shows the focused 
image segments with similar shape and Figure 10 
shows the focused image segments which are 
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determined to be overlapping segments. An image 
segment that is of similar size to a map segment 
satisfies the following rule 

map segment size 

50 <- x 100 < 150 

image segment size 

An image segment that is of similar shape to a map 
segment satisfies the following rule 

map segment shape 

50 <-x 100 < 150 

image segment shape 

An overlapping image segment is one where any pixel 
of the image segment overlaps any pixel of the map 
segment. 


CONCLUSIONS 

The results reported thus far are very encour¬ 
aging for the use of knowledge based systems for 
performing visual tasks, such as verifying the 
congruency of maps and images. Obviously, this is 
the first step in automating the process of 
integrating remote sensing data with geographic 
information systems. Further work is required and 
more rules must be added to enhance the functional 
performance of the congruency verification pro¬ 
cedure, but the same techniques should also apply 
then to the extraction of selected areas in the 
image and including this information in the GIS 
system. Future work will include experiments with 
LANDSAT Thematic Mapper data and federal topogra¬ 
phic maps. 
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FIGURE 3 LANDSAT MSS band 7 image for 
Prince George area BC 



FIGURE 4 LANDSAT image following Sobel gradient 
operator, segmentation and coding for 
display. This image corresponds to the 
LANDSAT image of Figure 3. 
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FIGURE 5 BC Ministry of Forests base map for 
doubleline hydrography vector data. 



FIGURE 7 Image segments (from Figure 4) that 

were classified as hydrography and are 
near any map segment. 



FIGURE 6 Map data following cleaning and 

rasterization of vector data from 
Figure 5. 




FIGURE 9 Image segments (from Figure 4) that FIGURE 10 Image segments (from Figure 4) that 

were of similar shape to any map were overlapping any map segment, 

segment that was near. 
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ABSTRACT 

This paper describes a context based technique 
for smoothing digital thematic maps produced by 
multispectral classification of Landsat 
Thematic Mapper data. The output of this 
technique is a "maplike" product which can be 
directly used as input to a geographic 
information system. 

Keywords - classification, remote sensing, geo¬ 
graphic information systems, image analysis. 

INTRODUCTION 

The work described here is the result of on¬ 
going research at MacDonald, Dettwiler and 
Associates Ltd. into techniques for automated 
mapping utilizing remotely sensed data. 

Multispectral classification techniques have 
been used on Landsat data to produce landcover 
maps. Traditional pixel-by-pixel multispectral 
classification techniques generally result in 
noisy or "speckly" images with a large number 
of small polygons which complicate the image 
and make the thematic image difficult to inter¬ 
pret. When classified imagery is used as an 
input to geographic information systems, the 
complexity of the classified image does not 
facilitate ease of map update or production. 
Thus there is a need for an effective technique 
to convert the classified image to a more car- 
tographically acceptable product. 

The role of a map is to effectively present 
information to users for their specific appli¬ 
cation at a given level of detail. In tradi¬ 
tional mapping, a minimum mapping unit criteria 
is frequently used to simplify the map. In 
developing the final map, a cartographer takes 
into consideration contextual and esthetic 
factors. 

Context plays an important role in cartography. 
It has been found that a large degree of the 
"errors" in automated computer classification 
is one of contextual interpretation. Even if 
an image could be classified with 100% accu¬ 
racy, if it does not correspond to the inter¬ 
pretation desired by the mapper it would have 
"error". Interpretation frequently depends or, 
the size, shape, and context. A small clearing 
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inside a forest may still be interpreted as 
"forest" whereas a similarly sized bog inside a 
forest may be interpreted as "bog". 

In this paper we demonstrate a technique which 
incorporates context in smoothing the digital 
thematic map to produce a more "maplike" 
product. 

REVIEW OF EXISTING TECHNIQUES FOR 
SMOOTHING DIGITAL THEMATIC MAPS 

Several different techniques for smoothing 
classified images have been documented includ¬ 
ing majority and minimum area filtering tech¬ 
niques [DAV7 6,SCH8 3] . 

In the majority filtering approach, the center 
pixel of an N-by-N neighborhood is replaced by 
the majority class of the neighborhood so small 
isolated polygons will be eliminated. Although 
this technique significantly reduces the number 
of polygons, small polygons still remain as 
there is no explicit control over the minimum 
polygon size. With larger window sizes there 
is a tendency for the majority filter to erode 
smaller features and affect the integrity of 
the polygon boundary location. Smaller window 
sizes however, do not produce adequate 
smoothing. 

In the minimum area filtering approach, the 
class of an undersized polygon is converted to 
the the class of the polygon with which it 
shares the largest common boundary• Since 
there is no explicit control of the class 
conversion of undersize polygons, there may be 
undesired class conversions when an undersized 
polygon is converted to a very dissimilar class 
rather than to a more similar neighbor. 

INCORPORATING CONTEXT IN SMOOTHING 
OF DIGITAL THEMATIC MAPS 

Recognizing the importance of context in map¬ 
ping we conclude that an effective technique 
for smoothing thematic maps should attempt to 
incorporate context. 

As an example of the importance of context, we 
cite criteria used by the B.C. Forest Service 
in their production of forest cover maps 
[FOR82]: 
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1. Minimum Area Criteria: 


• "Mininum 

area 

may be 

fixed or 

variable, 

with 

clearly defined. 

important 

stands 

i being mapped down 

to smaller areas 

than those which 

are less 

well 

defined 

and less 

important« 

n 



• "Minimum 

type 

size of 

approxi- 

mately 1« 

.5 cm 

and 1 • 0 

cm for 


forest and nonforest land respec¬ 
tively are recommended regardless 
of photographic scale.” 

• Exceptions are made to the minimum 
area criteria in certain contexts 
(see 3). 

2. Avoidance of Complicated Shapes: 

• "Connect small types with similar 
structure whenever possible." 

• "Avoid complicated, irregular type 
lines that hinder plotting, read¬ 
ing of the map, and history 
updating." 

3. Context: 

• "Type out small nonforest patches 
isolated within high value types 
and, conversely, high value pat¬ 
ches of timber within low value 
types." 

We base our technique for smoothing of digital 
thematic maps on the concept of minimal mapping 
areas and use context to guide conversion of 
undersize regions. Regions are defined as a 
contiguous group of pixels of the same type. A 
region is undersized if it occupies an area 
less than that specified for its class. By 
allowing individual minimum sizes for each 
c 7 ass and user specifiable preferences for 
class conversion we can incorporate spatial 
context, the significance of the land cover 
type, and the context of the end application 
into the smoothing process. 

Specifically the criteria that we use in our 
contextual smoothing technique are: 

1. Regions must be larger than the spec¬ 
ified minimum mapping size for its 
class or type (see Table 1). 

2. If an undersized region is surrounded 
then it is merged with the surround¬ 
ing region. 

3. ions below the minimum size that 
a_ e not entirely surrounded by a 
single class are merged with a neigh¬ 
bour whose class is most similar to 
the class of the undersize region. 
Table 2 shows a similarity matrix in 
which higher numbers represent an 
increasing degree of similarity. 


4. If there is equal preference of a 
merge, merge with the region which 
shares the larger common boundary. 

The minimum size of the regions and class simi¬ 
larities are specified by the user for the type 
of land cover being mapped, the mapping scale, 
and the end application of the map product. 

Detailed explanation of the implementation of 
the technique is beyond the scope of this 
paper. In our work, we utilize a vector repre¬ 
sentation of the regions as the basis for our 
operations. 

TEST RESULTS 

Evaluation of the contextual smoothing tech¬ 
nique was performed on thematic maps of two 
study areas in British Columbia, Canada: Adam 
River on Vancouver Island and Cranbrook in 
South Eastern British Columbia. The land cover 
maps produced from the classification of the 
Landsat scenes are for subsequent use in wild¬ 
life habitat mapping and thus the interpreta¬ 
tion or smoothing of the land cover map was 
tailored for this purpose. 

The Adam River and Cranbrook study areas were 
classified into 12 and 16 land cover classes 
respectively, using supervised maximum likeli¬ 
hood classification of Landsat 5 Thematic 
Mapper data. The results are shown in 

Figures 1 and 5. 

Prior to contextual filtering, a 3-by-3 major¬ 
ity filter is applied to eliminate small iso¬ 
lated groups of pixels (see Figure 2). Minimum 
polygon sizes and similarity matrices suitable 
for the application are then specified. 
Tables 1 and 2 show the minimum polygon sizes 
and similarity matrix used in the Adam River 
study area. Figures 4 and 6 show the results 
of contextual filtering on the classified 
images shown in Figures 1 and 5. 

Comparing the contextually filtered images to 
the raw classified images, we see a significant 
simplification of the images with a great 
reduction in the number of polygons. Table 3 
shows a relative comparison of the number of 
polygons after the different smoothing opera¬ 
tions for the Adam River study area. Comparing 
the results of the 5-by-5 majority filter (see 
Figure 3) to those of the contextual filter, 
we see that although the larger majority filter 
smooths the image significantly, it still 
leaves small insignificant polygons which the 
contextual filter has eliminated. 

An important criteria in evaluating the smooth¬ 
ing technique is the accuracy of the resulting 
output. Evaluation of the accuracy of contex¬ 
tual smoothing (see Table 4) shows that the 
technique results in accuracies similar to 
those of large majority filters. The boundary 
locations of the polygons which result from 
this technique tend to be more accurate than 
those from majority filtering due to a tendency 
for the majority filter to erode smaller 
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features. Furthermore, with a contextual 
basis, the conversion of the undersize classes 
wjlII in general be logically more accurate than 
with techniques which do not take context into 
account. 

Most importantly, the result is a product which 
is more "maplike", understandable, and compati¬ 
ble with geographic information systems. 

Although not illustrated in this paper, an 
additional contextual criteria, which considers 
the significance of the polygon is considered, 
was tested and could be useful in some applica¬ 
tions. The basic concept was that smaller 
polygons should be retained when it is of high 
significance than when it is of low signifi¬ 
cance. More simply stated, the minimum region 
sizes can change depending on the context; a 
smaller minimum size is warranted where the 
most similar adjacent polygon is significantly 
different (in our case we used a similarity 
score of less than 3) than in the case where 
the adjacent polygons are very similar. This 
criteria was utilized in further generalization 
of the image and it was found with that signi¬ 
ficant features were still retained. 

CONCLUSION 

In this paper we have shown an effective tech¬ 
nique for converting classified image into a 
more cartographically acceptable product. The 
result is an output that can be vectorized and 
directly input to a geographic information 
system without further digitization or 
generalization. 

We have shown the importance of context in 
mapping and demonstrated how context can be 
incorporated in a technique for smoothing of 
multispectral classification. 

This technique is particularly effective for 
filtering of multispectral classified imagery, 
but can be equally effective when applied to 
other types of digital thematic maps. 
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TABLE 1 ADAM RIVER - MINIMUM REGION SIZES 


Class 

Minimum 

Size 

(Pixels) 

Color Code 

Hemlock Clearcut 

100 

Dark Green 

Hemlock Mature Serai 

100 

Olive Green 

Forrested Rock 

100 

Dark Brown 

Red Alder 

50 

Bright Green 

Young Hemlock 
Clearcut 

250 

Bright Yellow 

Huckleberry Clearcut 

100 

Bright Orange 

Fireweed Clearcut 

100 

Medium Red 

Recent Clearcut 

100 

Bright Red 

Rock 

50 

Medium Grey 

River Bar 

25 

Light Blue 

Water 

15 

Dark Blue 

Snow 

100 

White 


TABLE 2 ADAM RIVER - SIMILARITY MATRIX 



o 

To 

MS M 


« 3 to 

M TJ 33 O m 


OH O O 4J 


MM 


osm c*eo.*®0> 


ssmtj9o0oo>po 

From 

9<0ottOd>0QH«e 


Hemcc 

5 

4 

4 

3 

2 

1 

1 

1 

1 

0 

0 

0 

Hemms 

4 

5 

4 

3 

2 

1 

1 

1 

1 

0 

0 

0 

Forrock 

4 

4 

5 

3 

2 

1 

1 

1 

1 

0 

0 

0 

Red Alder 

4 

4 

4 

5 

2 

3 

3 

2 

1 

0 

0 

0 

Young Hemcc 

2 

3 

2 

1 

5 

4 

4 

3 

1 

0 

0 

0 

Huckcc 

2 

2 

2 

2 

4 

5 

4 

3 

1 

0 

0 

0 

Fweedcc 

2 

2 

2 

2 

4 

4 

5 

3 

1 

0 

0 

0 

Reccut 

2 

2 

2 

1 

4 

4 

4 

5 

2 

0 

0 

0 

Rock 

1 

1 

4 

1 

3 

3 

3 

4 

5 

0 

0 

2 

River Bar 

0 

0 

1 

2 

3 

3 

3 

4 

4 

5 

0 

0 

Water 

0 

0 

0 

0 

1 

1 

1 

2 

3 

0 

5 

3 

Snow 

0 

0 

1 

0 

0 

0 

0 

1 

4 

0 

0 

5 


TABLE 3 ADAM RIVER - NUMBER OF POLYGONS 


Processing 

Number of 
Polygons 

Maximum Likelihood 
Classification (MLC) 

39,316 

3-by-3 Majority Filtered MLC 

8,123 

5-by-5 Majority Filtered MLC 

3,798 

7-by-7 Majority Filtered MLC 

2,254 

9-by-9 Majority Filtered MLC 

1,433 

Contextually Smoothed MLC 

502 


TABLE 4 ADAM RIVER - CLASSIFICATION ACCURACY 


Processing 

Accuracy (%) 

Maximum Likelihood 


Classification (MLC) 

74.9 

3-by-3 Majority Filtered MLC 

78.5 

5-by-5 Majority Filtered MLC 

80.6 

7-by-7 Majority Filtered MLC 

80.8 

9-by-9 Majority Filtered MLC 

79.1 

Contextually Smoothed MLC 

80.3 
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FIGURE 1 ADAM RIVER CLASSIFICATION 


FIGURE 2 ADAM RIVER CLASSIFICATION 
-3 BY 3 MAJORITY FILTERED 



FIGURE 3 ADAM RIVER CLASSIFICATION 
-5 BY 5 MAJORITY FILTERED 


FIGURE 4 ADAM RIVER CLASSIFICATION 
-CONTEXTUALLY SMOOTHED 



FIGURE 5 CRANBROOK CLASSIFICATION 


FIGURE 6 CRANBROOK CLASSIFICATION 
-CONTEXTUALLY SMOOTHED 
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"PRINCIPE DE CODAGE VXSUEL DE LA COULEUR APPLIQUE 
A DES IMAGES SATELLITAXRES" 


M-J. LEFEVRE-FONOLLOSA 
Ingenieur - Division Traitement de 1*Image 


H. CRUCHANT 

Technicien - Division Traitement de 1'Image 
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ABSTRACT 


RESUME 


In remote sensing, color is used essentially as 
a means of ehancing the results of image proces¬ 
sing. In recent months, we have investigated the 
use of color as a novel means of processing the 
spatial information content of image data. 

Recent advances in the field of color vision by 
E.H. LAND (1978), T. WIESEL and D. HUBEL (1977) 
and S. ZEKI (1983), reveal that the visual system 
of higher mammals can be schematically divided 
into three successive segments, namely the optics 
(or eye, performing the "sensor" function), the 
neurobiological segment (from retina to cortex, 
performing the "encoding" function), and the 
cerebral segment (performing the "recognition 
and interpretation" function). Only the first 
segment is sensitive to electromagnetic radiation. 
The other two segments are conceptual and inter¬ 
active . 

Apart from certain other advantages, the proces¬ 
sing performed by the three successive segments 
enables Man to recognize the color of an object 
irrespective or variations in the light illumina¬ 
ting that subject and in spite of the fact that 
such variation necessarily results in significant 
changes in the spectral composition of the reflec¬ 
ted radiation. 

We have attempted to use these characteristics of 
the human visual system in a novel approach to 
the processing of remote sensing imagery. 
Specifically, we use three channels of a spatial 
radiometer to simulate the "sensor" function of 
the human eye while using computer processing 
to simulate the function performed by the second 
segment of the system. 

When an image is sugjected to this type of simu¬ 
lation-processing, the result is three new-images 
termed the "color-coded image", the "lighting- 
coded image" and the "color-quantity-coded image". 

The paper concludes with comments of this approach 
and its prospects with suitable reference to exam¬ 
ples based on TM and (simulated) SPOT data. 


La couleur, en teledetection, est surtout utilisee 
comme un moyen de mise en valeur et de presentation 
des resultats. Durant ces derniers mois, nous avons 
envisage la couleur comme un moyen original de 
traitement de 1 1 information spatiale. 

Les decouvertes recentes sur la vision des couleurs 
(E.H. LAND, 1978 ; T. WIESEL et D. HUBEL, 1977 ; 

S. ZEKI, 1983), montrent que le systeme visuel 
chez les mammiferes superieurs se decompose sche- 
matiquement en trois segments successifs : optique 
(l'oeil -fonction "capteur"), neurobiologique 
(de la retine au cortex -fonction "codage") et 
cerebral (fonction "cognitive et interpretative"). 
Seul, le premier est sollicite par le rayonnement 
electromagnetique, les deux autres etant concep- 
tuels et interactifs. 

Entre autre performance, ces traitements succes¬ 
sifs permettent a l'Homme de reconnaitre le cou¬ 
leur d'un objet quelle que soit la variation de 
1'eclairement qui le baigne, bien que cette varia¬ 
tion entraine necessairement une sensible modifi¬ 
cation du rayonnement spectral reflechi par cet 
objet. 

Nous avons tente de mettre en application cette 
caracteristique visuelle dans le traitement des 
images de teledetection : a partir de trois ca- 
naux d'un radiometre spatial, lui-meme considere 
comme remplissant la fonction "capteur" de l'oeil, 
nous simulons par traitement informatique le se¬ 
cond segment, "codage" des couleurs. 

Les trois images nouvelles ainsi creees s'appel- 
lent : "color coded image", "lighting-coded image" 
et "color-quantity-coded image". 

L'interet de cette approche est commente a partir 
d'exemples tires de donnees Thematic-Mapper et 
SPOT. 
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Depuis quelques annees en matiere d'image , 

1*effort general technique et scientifique a ete 
considerable et s'est porte particulierement sur 
la couleur. Nous n'envisageons plus la couleur 
comme un simple melange trichrome, mais comme 
un signal specifique traitd a chaque etape de la 
chaine visuelle. Dans 1'application de ces con- 
naissances en traitement de 1'image de teledetec¬ 
tion, il a fallu prealablement developper un sys¬ 
teme qui permette de connaitre a coup stir la 
couleur resultante sur n'importe quel restituteur ; 
nous citons deux exemples significatifs. D'autre 
part, nous utilisons l'une des particularity 
de la vision a decomposer 1'image rdtinienne en 
une image de couleur et une image d'eclairement 
pour l'appliquer a une vue de tdleddtection d'une 
foret inegalement eclairee. 

Entre la retine et le cortex, le systeme vi- 
suel peut, schematiquement, se decomposer en trois 
segments successifs : 

- le premier segment servant a capter 1'informa¬ 
tion de couleur (cdnes sensibles aux courtes, 
moyennes et grandes longueurs d'ondes), puis a 
transmettre cette information aux aires concer- 
nees du cortex par un signal modifid : un canal 
achromatique et deux canaux antagonistes chroma- 
tiques ; 

- le second segment code la couleur dans la cou- 
che V4 de l'aire 17 suivant la reponse de deux 
types de cellules : les cellules WL et WLO qui 
sont sensibles a la composition de 1'eclairement 
et les cellules CO qui sont sensibles a toute 
variation de couleur. Cette forme parallele de 
codage de 1'dclairement et de la couleur a l'a- 
vantage de nous offrir une stability des couleurs 
face aux dclairements divers ; 

- le troisi£me segment, moins bien connu, est 
celui de 1'interaction avec les autres sens, 
celui de 1'influence de la culture, etc... Il nous 
permet d'interpreter la couleur. 

Dans un premier temps, nous nous sommes interessds 
aux"*systemes de representations de la couleur n£- 
cessaires pour 1'utilisation des restituteurs. 

Mais, comme notre but est surtout le comprehension 
de la vision appliquee a 1'image en general puis 
a 1'image de teiedetection en particulier, on a 
choisi un systeme physico-perceptif comme le mo- 
dele cylindrique de JUDD. Dans la pratique, on 
calibre les restituteurs de telle maniere qu'ils 
soient en relation identifiee au modele par une 
reference a des mires colorees organisees comme 
le preconise MUNSELL. 

L'image coloree de teiedetection est semblable 
a toute autre image en tant que telle ; elle dif- 
fere cependant des autres par son contenu infor- 
matif. Pour pouvoir retrouver puis extraire 1'in¬ 
formation de cette image, il est evidemment 
necessaire de lui appliquer un "traitement" ; ceci 
se fait en deux etapes : premierement, un pretrai- 
tement qui rendra le signal exploitable, puis, 
deuxiemement, un traitement proprement dit qui 
sera specifique de 1'information residuelle recher- 
chee. L'information originale contenue peut rester 
soit in-situ comme dans le cas de la composition 
coloree, soit etre tiree de son contexte et res¬ 
titute graphiquement comme dans le cas d'une clas¬ 
sification. 
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Dans le cas d'une composition colorte, notre 
principe a priori ttant de respecter le plus pos¬ 
sible la donnte radiomttrique, nous allons done 
nous attacher a identifier puis a sauvegarder les 
relations existantes entre chacun des invariants. 
Dans la coloration d'une telle image, nous tiendrons 
compte de la lisibilitt de l'oeil, c'est-A-dire 
qu'a une difference de signal radiometrique doit 
correspondre une difference perdue de couleur, 
proportionnellement. Exemple : nous avons, dans ce 
but, applique les courbes de sensibilite de l'oeil 
au modele de restitution choisi. 

Dans un autre exemple, nous desirons, au con- 
traire, extraire 1'invariant informatif de son 
contexte. Notre principe sera ici de respecter le 
sens relationnel existant entre l'objet et l'in- 
terprete. Nos travaux nous ont montre 1'importance 
d'une semiologie de la couleur (a ddvelopper) dans 
le cas d'une coloration de classification, ceci, 
afin que l'interprdte ait un meilleur acces aux 
donndes qu'il recherche. 

Nous insistons done dans ces deux cas, sur 
1'importance de la maitrise de la couleur, de la 
connaissance des modeles de representations colo¬ 
rees, des caractdristiques du systeme visuel et de 
la connaissance de la culture professionnelie de 
1'interprete. 

La ddcouverte de fonctions de- OOd&gt du deu- 
xieme segment cervical (retine-cortex) est rdeente 
(HUBEL et WIESEL). Elle permet de repousser les 
limites de la connaissance vers le troisieme seg¬ 
ment (intracervical "brain"). En quoi cette pro- 
prietd de la chaine visuelle peut-elle nous servir 
en matiere de teldddtection ? Dans ce domaine, nous 
possddons plusieurs bandes passantes (canaux) qui 
captent les informations spectrales d’un paysage 
terrestre. Si nous nous limitons a trois canaux, ce 
qui est le cas de SPOT, nous pouvons essayer de 
moddliser cette fonction de codage du signal spec¬ 
tral, e'est-a-dire de transformer les valeurs spec¬ 
trales en deux termes qui codent inddpendamment 
la couleur intrinseque des objets et leur eclaire- 
ment. En d'autres termes, il nous apparait comme 
une application importante de la couleur de proed- 
der a une fonction de reduction et de synchese de 
1'information. Nous inspirant des ddcouvertes du 
neurophysiologiste Britannique Semir ZEKI, trois 
informations spectrales aboutiront deux informa¬ 
tions inddpendantes. 

1°) Nous creons une "color coded image", sen¬ 
sible aux differences de couleurs, quel que soit 
1'eclairement, assimilable grossierement aux repon- 
ses des cellules CO (Color Coded cells) de ZEKI ; 
ceci est par ailleurs en accord avec les expdrien- 
ces de E.H. LAND concernant la stabilitd de la 
perception des couleurs dans des environnements 
differents (Mondrians et Rdtinex). 

2°) Nous avons egalement cree deux images de 
quantification de la couleur assimilables gros¬ 
sierement aux reponses des cellules WL et WLO 
(Wavelenght cells) de ZEKI. C'est-a-dire sensibles 
a la composition de longueur d'onde et a l'dclai- 
rement : 

- "lighting-coded image" egale a la quantity de 
flux total (decompose en flux de couleur et flux de 
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blanc) lide A la composition de la signature spec- 
trale et indicatrice de 1'dclairement. 

- "color-quantity-coded image" qui est la 
proportion du flux de couleur par rapport au flux 
total. C'est une variable indicatrice de la quan¬ 
tity de couleur independamment de la nature de 
la couleur elle-meme. 

Un algorithme simple nous permet de visuali- 
ser separement ces trois images comme pourrait 
peut-etre le faire un cerveau humain ampute de la 
fonction d'integration. Nous decrivons les effets 
d'un tel traitement pour deux types de paysages : 
un paysage d'estran (lie de Noirmoutier, Vendde, 
FRANCE) et un paysage forestier sur une topogra- 
phie accidentde (Montagne-Noire, Lauragais, 

FRANCE). 

L'analyse de ces travaux, qui reste d'ail- 
leurs a confirmer, montre que cette approche 
psychosensorielle du traitement de 1*image en 
teiydetection offre a l'interprete un outil de 
discrimination nouveau et original. 

En prolongement de cette dtude, il nous 
parait utile de rdfldchir, d'une part, sur les 
problemes calculatoires de la vision des couleurs 
et sur leur application dans le domaine de la 
vision par ordinateur j d'autre part, et a con¬ 
dition que le langage intracervical Brain soit 
mieux connu, sur de nouvelles mdthodes de photo- 
interprstation. 


Graphics Interlace ’86 


Vision Interface ’86 



287 


A File Organization Scheme for Polygon Data 

Chung Hee Hwang & Wayne A. Davis 

Department of Computing Science 
University of Alberta 
Edmonton, Alberta, Canada T6G 2H1 


Abstract 1 

This paper presents a file organization scheme for 
representing polygon data by a quadtree. The proposed scheme 
is an adaptive cell method that is based on extendible hashing 
and interpolation-based index maintenance. It aims, on the 
average, to locate the record associated with a given key with 
one disk access (or at most two), maintaining a high storage 
utilization ratio. It also aims to process range search and set 
operations efficiently. The dynamic file organization capabilities 
of the scheme and the algorithm for range search are described. 


1. Introduction 

Given a sequence of k points Pi=(*i,yj), for 1 <; i < &, in a 
plane, a polygon with vertices Pj is the sequence of line 
segments, called edges, PjP 2 , P 2 P 3 , .., Pk p l* If these k edges 
do not have any intersection points, the polygon is simple. A 
simple polygon divides the plane into two distinct regions. The 
interior of a simple polygon is called a polygonal region. A 
complex polygonal region is a polygonal region which is 
allowed to have one or more holes in it. Hereafter, a complex 
polygonal region is called a polygon. A polygon network for a 
study area is a set of disjoint polygons overlapping the study 
area such that the set of polygons yields a total partition of the 
study area. Each polygon in a polygon network has a unique 
name. 

Although polygon networks have been traditionally 
represented in vector format, recently quadtree encoding of a 
polygon network has received increased attention. The quadtree 
[7] is a dynamic data structure developed to reduce the storage 
requirement of raster representation by aggregating 
homogeneous cells. Nevertheless, as the original quadtree 
concept was based on the assumption that quadtrees were 
resident in main memory, quadtree structures may not be directly 
applicable to data resident in external memory. For example, the 
need to follow pointers may lead to a larger number of page 
faults than are acceptable in an interactive environment. 

In an effort to overcome the frequent page fault problem, 
there have been studies to represent a quadtree as a linear 
quadtree [5] and use a B-tree file structure in organizing the data 

[1,8]. While the B-tree organization of a linear quadtree is a 
significant improvement over the original quadtree organization 
in the expected number of disk accesses for single record 
retrieval, the absence of a localization property is a primary 
disadvantage. A query usually requires the whole file to be 
retrieved even though the queiy can be answered with 

A This research was supported in part by Grant NSERC A7634. 
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information from a small part of the file. For example, range 
search is very awkward and set operations are not efficient 
because there is no implied connection between data buckets in 
physical storage and regions in the search space. Furthermore, 
the B-tree organization of quadtree encoded data still needs 
several disk accesses to retrieve each record because it is 
essentially a tree that is accessible with 0(log n) I/O operations, 
where n is the number of records in the file. 

From this perspective, a file organization scheme is 
developed for polygon networks encoded as a linear quadtree 
with an aim to locate the record associated with a given key with 
an average of one disk access (or at most two), maintaining a 
high storage utilization ratio. Furthermore, the following types 
of spatial queries are to be supported efficiently: the point-in- 
polygon query, range search and set operations such as polygon 
union or intersection and polygon overlay. The scheme is an 
adaptive cell method and it is based on extendible hashing [4] 
and interpolation hashing [2], a k-dimensional generalization of 
linear hashing [6]. 

2. Definitions and Notation 

Let the study area, U = [0, 2 n )2, be an image of 2 n x 2 n 
unit square pixels that intersects a polygon network, and let each 
of the pixels have a polygon name (hereafter called color) 
associated with it Furthermore, let the polygon network on U 
be represented by a region quadtree. To yield an arbitrary but 
consistent total ordering among the blocks of a quadtree, the 
following hash function is introduced: 

Definition 1. Let (x,y) e U = [0,2 n )2 be the x and y 
coordinates of the lower-left comer of a block of a quadtree 
defined on U and have the following binary representation: 

x = E and y = £ b^ 1 , for 0 < i £ n-1, 
where aj,bi e {0,1}. Then, an order preserving hash function , 

5 , that maps (x,y) onto the key space of [0,4 n ) is defined by: 
■s(x,y) = Z(ai2 2i+1 + b|2 2i ), for 0 £ i <> n-1. 

Notice that the key produced for each of the blocks in this 
manner is essentially the same as the locational code of the block 
in linear quadtree encoding [5]. Now, a file that represents a 
polygon network by a quadtree is defined: 

Definition 2. A file F representing a polygon network for the 
study area U by a quadtree is the set, 

F = {(K(L),S(L),C(L)): L e R}, 
where R is the set of all leaf nodes (blocks) of a region quadtree 
on U, 

K(L) is the key of L produced by s, 

S(L) is the size, or alternatively the level, of L, and 

C(L) is the color of the pixels intersecting L. 
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In order to structure the file F using an adaptive cell method, 
the study area is partitioned into a set of blocks and/or subblocks 
which are defined in the following. 

Definition 3. A block of depth d , 0 ^ d ^ maxd <£ 2 n, 
where maxd is the predefined maximum depth of a block 
partition, is a rectangular region in the study area with a standard 
shape and a standard location that are the same as those of a 
region produced by recursively halving the study area d times 
with lines alternately perpendicular to the x and y axes. A block 
with its depth equal to maxd is called a minimal block. 

Definition 4. A subblock of depth d, where 
maxd < d £ 2n, is a rectangular region in the study area with a 
shape and a location that are the same as those of a region 
produced by recursively halving the study area d times with lines 
alternately perpendicular to the x and y axes. Within each 
minimal block there exist at most two different depths of 
subblocks, i.e., d' and d" such that d" = d' + 1. 

Throughout this paper, maxd will be used to denote the 
predefined maximum depth of a block partition. The following 
definition is useful for defining an adaptive cell method 

Definition 5. A fixed data bucket is a bucket which contains 
no more than a predefined number of records, b, and an 
expandable data bucket is a bucket which may contain more than 
b records by attaching one or more overflow fields to it 

An adaptive cell method is now defined that organizes the 
leaf nodes of a region quadtree into a file. 

Definition 6. An adaptive cell method of organizing a file F is 
an abstract data type which: 

(1) guarantees that, for every cell G1 and G2 and for every 
record LI € F n G1 and L2 e F n G2, 

key (LI) < key(L2) if index(Gl) < index(G2), 

(2) guarantees that, for every subblock Gl' and G2’ of a 
minimal block G and for every record Ll* e FnGl' and 
L2‘ € F n G2\ 

key(Ll’) < key(L2’) if index(Gl’)< index(G2'), and 

(3) asserts that every block of depth d has exclusively one fixed 
data bucket associated with it if d < maxd; otherwise (the 
case of a minimal block), it has associated with it either a 
single fixed bucket exclusively or two or more expandable 
buckets that are contiguously located in physical memory 
such that: 

a) each expandable bucket is exclusively associated with 
exactly one subblock of the minimal block, and 

b) the overal load factor , i.e., the ratio of the number of 
existing records to the number of slots available, of these 
expandable buckets is within some predefined range. 

It is understood from Definition 6 that every data bucket is 
associated with a block or a subblock in the study area. 
Consequently, each data bucket has associated with it a depth 
which is equal to the depth of the block or the subblock it 
corresponds to. For an illustration of the concept of Definition 
6, consider the polygon network in Fig. 1. Let the predefined 
maximum depth of a block partition maxd = 4 (maxd is 
normally small so that the directory may be stored in main 
memory), the capacity of a data bucket b = 5, the capacity of an 
overflow field b' = 3, and the lower and upper limits of die load 
factor be 0.40 and 0.75, respectively. Then, for the image of 
Fig. la, the proposed scheme produces a partition as shown in 
Fig. lb. 
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Fig. 1. Image of a Polygon Network 
and Partition of the Data Space. 


3. Mapping between Regions and Data Buckets 

Ordinarily the set of records in F is distributed over a 
number of data buckets, and each data bucket has associated 
with it a block or a subblock in the study area. The mapping 
between blocks and data buckets is achieved by a directory. A 
directory is a set of elements, each of which corresponds to a 
cell of size 2 ^n-d, w here d is the maximum of the depths of the 
existing blocks. Thus, a directory has associated with it a depth 
whose value is the same as d. 

Each element of a directory has a pointer to a data bucket or a 
set of buckets which contains records describing die quadtree 
leaf nodes that intersect the corresponding cell in the study area. 
At depth d of a directory, there are altogether 2^ pointers, 
indexed from 0 to 2^-1, which are not necessarily unique. The 
pointers of a directory are indexed in such a manner that a data 
bucket or a set of data buckets pointed to by a pointer with an 
index < contains all the records whose keys are prefixed with bits 
that are identical to the binary representation of i. That is, a data 
bucket or a set of data buckets pointed to by pointer 0 contains 
all the keys that start with d consecutive "0*' bits, a data bucket 
pointed to by pointer 1 contains all the keys that start with d -1 
consecutive "0" bits followed by a "1" bit, and so on. Thus, the 
pointer i is guaranteed to find all the keys whose first d bits 
agree with the binary representation of i. 

This indexing scheme is in fact equivalent to a Morton 
sequence [7] and naturally satisfies the first and second 
requirements of Definition 6. Fig. 2 illustrates the 
correspondence between the regions in the study area and 
directory elements (indexes are shown both in decimal and 
binary). Note that when the depth of a directory is odd, each 
pair of buddies at the deepest level are numbered consecutively 
from left to right, e.g., the pair 0 and 1 and the pair 2 and 3 in 
Fig. 2a. 
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Fig. 3. Correspondence Between Blocks & Data Buckets. 


The mapping between directory elements and data buckets, 
or sets of data buckets, is many-to-one. Fig. 3 shows the 
directory configuration corresponding to the partition shown in 
Fig. lb. In Fig. 3, buckets C, E , F, G and H are fixed buckets, 
and buckets AOO, A01, Al, BO, Bl, D00, D01, and D1 are 
expandable buckets (bucket Al has an overflow field attached to 
it). Note that all the records contained in a data bucket of depth 
d have the same bit pattern in their first d bits. Thus, bucket F, 
whose depth is 3, consists of the records whose keys start with 
" 100", while bucket AOO whose depth is 6 contains all the 
records whose keys start with "000000". Also, notice that 
buckets AOO, A01 and Al contain 13 records in total while their 
capacity is 18. Thus, the load factor of these three expandable 
buckets is 13/18 = 0.72, which is within the predefined range. 
The directory has 2 4 pointers because the largest of the depths of 
existing blocks is 4 which is the same as maxd. Note that 
pointer 0 points to a set of buckets (AOO, A01 and Al) which 
contain all the records that start with ”0000". The 
correspondence between subblocks and expandable data 
buckets, however, is not shown in the directory. That is, the 
corresponding pointer in the directory points to the starting 
address of a set of buckets that are physically located together, 
but it does not specify the correspondence between each of the 
subblocks and data buckets. 

How the mapping between subblocks and data buckets are 
achieved will now be shown. Fig. 4 shows examples of a 
subblock partition of a minimal block. The subblocks of a 
minimal block are indexed in a similar manner as the cells 
corresponding to directory elements are indexed. In fact, when 
all the subblocks are the same size, they are indexed in the 
exactly same manner. Examples are shown in Figs. 4a and 4c. 
However, when there exist two different sizes of subblocks, the 
larger subblocks have two candidates for their index. In that 
case, the smaller of the two is selected for the index of the 
subblock as in Figs. 4b and 4d. As a result, when subblocks 
are of different sizes, the indexes of subblocks are not 
continuous. 



Fig. 4. Subblocks & Their Indexes. 


Every subblock has an expandable data bucket associated 
with it The keys of the records contained in a subblock of 
depth d have the same bit pattern in their leftmost d bit places. 
More explicitly, their leftmost maxd bits agree with the index of 
the minimal block the subblock belongs to, and the next (d - 
maxd) bits agree with the index of the subblock itself. Now, the 
mapping between subblocks and buckets is achieved by 
numbering each of the buckets that belongs to the same minimal 
block in a specific way as follows. Let a data bucket D be 
associated with a subblock whose index is /. Furthermore, let 
the maximum depth of existing subblocks be d. Then, i can be 
represented by a bit string S which is (d - maxd) bits long. 

Next, let k be a number represented by a bit string S' which is 
the reversed bit string of S. Then, k is the bucket number of D, 
i.e., D is the (k + l)st of the set of buckets that are contiguously 
located. The proof of this "reversed bit pattern" relation between 
i and k can be easily shown by induction (see [2] for a formal 
proof). Fig. 5 illustrates the correspondence between data 
buckets and subblocks. 
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Fig. 5. Correspondence Between Subblocks 
& Data Buckets. 


4. Dynamic Nature of the Scheme 

The proposed file organization scheme allows a file structure 
to adapt its shape automatically to the nature of the data to be 
stored, i.e., the amount and the distribution pattern. The 
adaptability of the scheme is obtained mainly by a dynamic 
partition of the data space, which is implemented by splitting 
and merging mechanisms. In this section the merging 
mechansim is briefly described. See [3] for details of the 
dynamic file organization technique. 

As more and more data is inserted in a file, data buckets 
overflow and this results in splitting of buckets. There are four 
kinds of splits possible. The first type of split occurs when a 
record is assigned to a data bucket that is full and pointed to by 
more than one pointer of the directory. In this case, the 
overflow bucket is split to resolve the collision, and the pointers 
in the directory are adjusted to reflect this split 
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The second type of split arises when the overflow bucket is 
pointed to by a single pointer, and the directory has not reached 
its maximum yet. Then, in addition to a data bucket split, 
refinement of the cell partition in the data space is required as 
well as a directory doubling. A directory doubling involves 
copying of the entire directory in such a manner that the old 
contents of element i, for i = 0, 1,..., 2^-1, where d is the old 
value of directory depth, is copied into elements 2i and 2/+1. 

The third case occurs when the depth of the directory has 
reached its maximum already. Then, the overflow bucket is split 
into two expandable buckets, numbered 0 and 1, bucket 1 being 
physically allocated after bucket 0. This is called a linear bucket 
split. 

The fourth type of split occurs when a record is assigned to 
an expandable bucket, and the load factor exceeds the upper limit 
as a result of insertion. Suppose a record is assigned to a data 
bucket of depth greater than d. Then, the record is first inserted 
into the bucket, or if necessary, into its overflow field, and the 
overall load factor of the set of buckets that are associated with 
the same minimal block is calculated and checked again the 
predefined range. If the load factor exceeds the upper hunt, a 
linear bucket split is triggered, i.e., a new bucket is allocated at 
the end of the existing buckets of the set, and the bucket 
designated by the variable next to split , explained in the 
following, is split into two. If the load factor still exceeds the 
upper limit, the splitting process is repeated. 

Similar to linear hashing [6], the following two variables are 
used to control linear bucket splits: j - split level, and p - next to 
split. The split level, y, indicates the level of linear splits within 
each minimal block. Initially, j is set to 0 for every minimal 
block but is increased as linear bucket splits are performed so 
that 

j = max (depths of all subblocks ^ - maxd. 

^within the minimal block ) 

Next to split, p, points to the bucket which is to split next. It is 
initially 0 for every minimal block, but is increased by one as a 
linear bucket split occurs. However, at the end of each cycle of 
linear bucket splits, p is reset to 0. That is, during the first cycle 
of splits, bucket 0 is split; during the second cycle, first, bucket 
0, and then bucket 1 is split; and during the k-th cycle, 
buckets are split in the order of 1,..., 2 k ~ 1 -l. 

5. Range Search 

This section describes how the proposed file scheme 
supports range search. Given two points, (xj,yi) and (X2,y2)> 
where x^ < X2 and y\^y2> specifying a query rectangle, the 
proposed file scheme is able to retrieve every data bucket that 
contains the records describing the quadtree leaf nodes which 
overlap the query rectangle without retrieving any irrelevant data 
buckets. An algorithm for identifying the relevant data buckets 
for a given query rectangle is described using the example of 
Fig. 6. Suppose a directory has depth 4 which is the same as 
the predefined maximum depth. Suppose also that the shaded 
area in Fig. 6 is the query rectangle. Determination of the cells 
that intersect the query rectangle is done as follows: 

(1) Using Algorithm Access in Appendix B, determine the 
indexes of cells in which (x},yi) and (X2,y2) are contained. 
In this example, they are 2 and 14. 

(2) Decompose the bit pattern of these indexes into their x and y- 
components. Let the x-component of the higher index be the 
upper limit of x. Similarly, determine the lower and upper 
limits of y. 

Lower index: 0010(2) xlow: 01(1) 

xyxy ylow: 00 (0) 

Higher index: 1110(14) xhigh: 11(3) 

xyxy yhigh: 10 (2) 
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Fig. 6. Range Search. 


(3) For each of x and y, create a set of numbers which contains 
all the integers that are between the lower and upper limits 
inclusive. 

x: (01(1), 10(2), 11(3)} 
y: {00(0), 01(1), 10(2)} 

(4) Obtain a cross product of these sets, where each member of 
the resulting set is an integer produced by interleaving the x 
and y components. This set then indicates the cells that 
overlap the query rectangle. In this example, the following 
cells overlap the query rectangle: 2,3,6,8,9,10,11,12 and 
14. 
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(5) Next, suppose some of the cells have been further 
subdivided, e.g., cells 9 and 12. As for cell 9, every 
expandable bucket associated with it should be retrieved 
since the cell is completely contained in the query rectangle. 
Retrieval of these buckets can be done using Algorithm 
SequenRetrieve in [3]. As for cell 12, the subblocks that 
intersect the query rectangle should be determined in a 
similar manner as the cells intersecting the query rectangle. 
The bucket number corresponding to each relevant subblock 
is then obtained by reversing the bits of the subblock index. 
The detailed description of the algorithm is given in 
Algorithm RangeSearch in Appendix C. 


6. Performance of the Proposed Scheme 

In this section, the performance of the proposed file 
organization scheme is compared with other file organization 
schemes in terms of access efficiency for single record retrieval. 
The object for comparison is a B+-tree that has been proposed 
and implemented for representing a polygon network by a linear 
quadtree [1,8]. In addition, the EXCELL method is also used 
for comparison. Although the EXCELL method was originally 
used for representing a polygon network in vector format [9], 
the method is also useful for representing a polygon network by 
a quadtree. 
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The B+-tree maintains consistent performance both in 
storage and access time as its structure is not affected by the 
pattern of data distribution. Its buckets are more or less 
uniformly filled (each bucket being at least half full) even with 
non-uniformly distributed data, and every record in the file can 
be retrieved with (9(log n) disk accesses, where n is the number 
f records in the file. In contrast, the proposed scheme and the 
EXCELL methods, i.e., cell methods in general, are very 
sensitive to the pattern of data distribution. With these methods, 
the best case occurs when data is distributed uniformly over the 
study area. Then, the proposed scheme requires one disk access 
to locate a record with a given key while the EXCELL method 
requires two disk accesses. The worst case occurs when data is 
distributed non-uniformly over the study area. With the 
EXCELL method, the directory will become large and unwieldy 
and the goal of two disk accesses for retrieving a record cannot 
be achieved. On the other hand, with the proposed scheme, 
there will be a long chain of overflow fields as well as many 
underflow buckets. Thus, both schemes may take 0(n) disk 
accesses for single record retrieval in the worst case. 

The poor performance of a cell method is due to its extreme 
sensitivity to the existence of a random cluster of data. 

However, as a hybrid of extendible hashing and linear hashing, 
the proposed scheme allows its file structure to be considerably 
adapted to the nature of data. As the directory of the proposed 
scheme divides the study area into a coarse grid, any non¬ 
uniformity of data distribution affects the file structure only 
within a grid cell rather than over the entire study area. 
Furthermore, the probability of the worst case happening in 
practical data is expected to be exceedingly low. Since a worst 
case analysis does not provide meaningful conclusions, the 
performance of the proposed scheme has been simulated using a 
set of real data. 

The scheme has been applied to a surficial geology map of 
the Wabamun area in Alberta, Canada (114°-115°W and 53.5°- 
54°N). Next, the same data have been used to estimate the 
performance of a B+-tree and the EXCELL method. It has been 
shown that the proposed scheme performs better than either a 
B+-tree or the EXCELL method in the expected number of disk 
accesses required to retrieve a record in a file, with a higher 
storage utilization ratio. See [3] for details. Although a formal 
proof cannot be given, it is conjectured that with the proposed 
scheme the expected number of disk accesses required for 
locating the record with a given key is constant irrespective of 
the file size, while that of B-trees or the (hierarchical) EXCELL 
method [11] grows logarithmically with the file size. 


7. Conclusion 

In most geometric databases, I/O operations are the 
bottleneck of their performance due to the large volume of data 
that should be handled. The proposed file organization scheme 
is an adaptive cell method which attempts to minimize the 
number of disk accesses in performing spatial queries. As a 
hybrid of interpolation hashing and extendible hashing, the 
proposed scheme combines the best features of both. First, the 
mapping between data buckets in physical storage and regions in 
the search space is interpolated rather than stored. Secondly, 
since a directory allows the search space to be divided into a 
coarse grid, any random cluster of data affects the file structure 
only within a grid cell rather than the entire file structure. 
Thirdly, a compromise between space and access time can be 
obtained by controlling the load factor. 

Another important feature of the proposed scheme is that it 
handles spatial queries, range search in particular, efficiently by 
allowing a query to be decomposed into a set of subqueries 
within cell restrictions. 


Experimental results with a set of real data show that the 
proposed scheme is superior to a B+-tree both in access 
efficiency and storage utilization. Additionally, the scheme is 
comparable to the EXCELL method which was originally 
proposed for representing a polygon network by vectors. 
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APPENDIX 
A. Data Structure 

The file structure consists of a directory and data buckets. 
The directory has a header containing the depth of the directory 
followed by elements, where d is the depth of the directory. 
Each element of the directory is a 5-tuple of 
<j, p, occ, over, ptr>, where j is the split level, p is the 
bucket to be split next, occ is the number of records contained in 
the data bucket (or set of expandable buckets), over is the 
number of overflow fields employed, and ptr is the pointer to the 
data bucket (or set of expandable data buckets). 

Each data bucket or overflow field contains a set of records, 
(K(L), S(L), C(L)). In addition to the set of records, each data 
bucket has a header that contains db, the bucket depth, and a 
pointer, ptr. If a bucket is an expandable one, ptr points to a 
chain of overflow fields attached to it; otherwise, it may be either 
ignored or used to point to the next bucket. 
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B. Algorithm Access 

Input: file F, directory dir to F, and (x,y) € U, 
where U = [0,2 n ) 2 is the study area. 

Output: data bucket D which contains (K(L),S(L),C(L)) such 
that (x,y) € L. (The record may be in an overflow field 
of D.) 

Note: dir[i].A denotes field A of (i+1 )st element of directory. 
Step 1: key 4— s(x,y) 

Step 2: read d, depth of directory 

Step 3: i 4- L (key / 2 2n - d J #determine dir index# 

Step 4: loc 4- dir[i\.ptr #read pointer value# 

Step 5: if dir[i].j = 0, goto Step 8 
Step 6: #case of split level *0# 

a. if dir[i].p = 0, #every bucket split# 

1) #set subblock index withy bits of the key# 

isub <- l(key mod 2 2o d )/2 2n - d -jJ. 

where j denotes dir[i]J 

2) #calculate bucket no.# 

bnum 4- Xa k 2 m " 1 - k , for 0 < k £ m-1, where 
isub is Xa k 2 k , for 0 < k < m-1 

b. otherwise 

1) #set subblock index with (j- 1) bits of the key# 

isub <- {.{key mod 2 2n d )/2 2 “- d -i +1 J, 
where j denotes dir[i].j 

2) #calculate bucket no.# 

bnum 4- Xa k 2 m - 1_k , for 0 < k £ m-1, where 
isub is Xa k 2 k , for 0 <i k ^ m-1 

3) #if bucket split, adjust bucket no.# 

if bnum < dir[i].p and ( d+j)th bit of key = "1", 

bnum <— bnum + 2H 
Step 7: #calculate address of the bucket# 

loc 4- loc + bnum * unit-length, where 
unit-length is the bucket size 
Step 8: access bucket D at loc and exit. 

C. Algorithm RangeSearch 

Input: file F, directory dir to F, and (xl,yl),(x2,y2) e U, 
where xl < x2 and yl < y2. 

Output: retrieve every data bucket whose associated block or 

subblock in U intersects the query rectangle specified by 
(xl,yl) and (x2,y2). 

Step 1: R <— { } #initialize index set# 

Step 2: read d, depth of directory 
Step 3: #calculate keys and indexes# 

keyl 5(xl,yl); il 4— l .keyl f 2 2n_<i J 
key2 4— s(x 2,y2); /2 <— L key2 / 2 2n_d J 
Step 4: if il = /2, #trivial case# 

R 4- {//} and goto Step 12 
Step 5: #determine limits of index# 

a. if d is even, 

1) xlow 4- Xa 2k+ i2 k , for 0 ^ k ^ Lm/2j 

2) xhigh 4- Xb 2 k+i 2 k , for 0 < k < Lm/2j 

3) ylow 4— Xa2k2 k , for 0 ^ k ^ fm/2l 

4) yhigh Zb2k2 k , for 0 <> k < [ m/2~| 

b. otherwise, 

1) xlow <— Za 2 k 2 k » for 0 < k < Tm/2l 

2) xhigh 4— Zb2 k 2 k , for 0 ^ k <[m/2] 

3) ylow 4- La 2 k+ i 2 k , for 0 < k ^ Lm/2j 

4) yhigh 4- Xb^^, for0<k<Lm/2j 

where il (i2) is Xa k 2 k (Xb k 2 k ), for 0 < k < m-1 


Step 6: ix 4- x/ow #initialize x index# 

Step 7: iy 4 — y/ow #initialize y index# 

Step 8: #compute index of relevant block# 
if d is even, i 4- shuffle(ix,iy); 
otherwise, i 4- shuffle(/y,£x), where 

shuffle(V,W) = X(2v k +w k )2 2k , for 0 £ k < m-1 
given V =Xv k 2 k and W = Xw k 2 k , 

Step 9: R <— RU {/} #store the index# 

Step 10: #continue until y upper limit is reached# 

1) increase iy by 1 

2) if iy < yhigh , goto Step 8 

Step 11: #continue until x upper limit is reached# 

1) increase ix by 1 

2) if ix <> xhigh, goto Step 7 

Step 12: loc 4- null #initialize# 

Step 13: for each member i in R, perform 

a. #determine address of bucket# 

if dir[i].ptr * loc, loc 4- dir[i].ptr, 
otherwise, goto Step 13.e 

b. if dir[i\.j = 0, 

retrieve bucket at loc and goto Step 13.e 

c. #linear splits have occurred# 

if the block is totally contained in the query rectangle, 
retrieve every bucket belonging to the block using 
Algorithm SequenRetrieve and goto Step 13.e 

d. #the block is partially contained# 

1) R' 4- { } #initialize subblock index set# 

2) let (xl',yl’) and (x2 , ,y2'), where xl' ^ x2’ and 

y 1' ^ yl', be the points specifying the rectangle 
which is the intersection of the current minimal block 
and the query rectangle 

3) compute R' in a similar manner to Steps 3-11 

Note: in Step 8, d should be substituted with j 

4) #compute the number of buckets# 

if dir[Q.p = 0, M 4— 2i; 
otherwise, M 4-2H+p 

5) for each member isub in R', 

compute bucket number bnum; 
if bnum < M, 

calculate address of bucket and retrieve 

e. continue. 
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ABSTRACT 

Methods of low level image processing have 
predominantly been derived by generalizing one 
dimensional signal processing methods to 2 di¬ 
mensions. Although much progress has been made 
using Fourier domain analysis, it is still unin¬ 
tuitive and inflexible for many types of appli¬ 
cations. This paper reviews mathematical mor¬ 
phology as a basis for low level image proces¬ 
sing, and demonstrates that image domain trans¬ 
formations can be applied usefully to various 
types of images. Specifically, range images are 
processed using grey scale morphology to extract 
various features used in three dimensional ana¬ 
lysis of a scene. This type of scene analysis 
has broad potential for applications in quality 
control and automated manufacturing. 


RESUME 

Le traitement primaire d*image a traditionel- 
lement ete inspire des methodes de traitement de 
signaux en les generalisant pour deux dimensions. 
Le traitement base sur 1'analyse dans le domaine 
de Fourier a permis d‘important progres mais 
defie 1‘ intuition et demeure trop peu flexible 
pour nombre de problernes. 

Nous faisons ici un compte rendu des applications 
de la morphologie en tant que base d‘analyse 
primaire de 1‘image en l'appliquant a diverses 
classes d'images. Plus precisement, des images 
tri-dimensionelles sont traitees par la methode 
morphologique en tons de gris pour extraire 
divers elements propres a 1'analyse de scenes 
tri-dimensionelles. 

Cette classe d'analyse ouvre un large eventail de 
possibility dans les applications de controle de 
qualite et de fabriquation assistee par ordina- 
teur. 


I. INTRODUCTION 

The arrival of mathematical morphology for 
image processing has allowed North Americans to 


reconsider their approach to image transforma¬ 
tion. Transform domain techniques, like the 
Fourier transform, used to decompose an image 
into constituent spatial frequencies, grew out 
of classical one dimensional signal processing 
where the dominant analytical theme is linear 
transformation. Conventional image proces¬ 
sing extended the subject base to two or more 
dimensions, but the paradigm remained one of 
filter design for removal of noise and enhan¬ 
cement of visual fidelity. 


A morphological approach allows the 
designer of vision systems a ubiquitous tool to 
perform image transforms in the image domain, 
using the algebra of shapes. Although the 
morphological treatment of images has been 
studied for more than 50 years, its recent 
popularization is due mostly to Serra (1), and 
Sternberg (2) (3), the latter introducing 
greyscale morphology. 

The use of mathematical morphology has 
overcome several obstacles in the development 
of industrial vision systems. Mathematical 
morphology is first of all a mathematics of 
image transformation and analysis, thus it 
forms the basis for a powerful language of 
image processing. This language is not only 
powerful, it is intuitive, and can be under¬ 
stood visually while developing image processing 
algorithms interactively. As a general purpose 
method of image transformation, all types of 
images can be processed regardless of the sensor 
used to collect the image. Satellite, micro¬ 
scope, x-ray, T.V., ultrasound, tactile array 
sensors, laser range finder images, etc., can all 
be transformed productively using mathematical 
morphology. This paper describes the basic 
operations of how to transform an image with a 
shape, calling on the readers intuition instead 
of the mathematical basis of the transformations 
which are exhaustively described in (1) and (4). 
We will demonstrate, using figures, transforma¬ 
tions on binary and grey scale images, including 
images acquired using a laser range finder. It 
will be pointed out that these morphological 
transforms can be used for low level (data 
driven), and high level (model driven) aspects of 
industrial applications. 
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II. RANGE IMAGE PROCESSING 

Currently range finding cameras are becoming 
commercially available. These devices use 
passive or active light sources to record two 
dimensional arrays of depth, i.e., the distance 
from the camera to the surface being imaged. 
Active light sources (often lasers) are used in 
two ways to collect depth information. Time of 
flight range finders are similar in concept to 
radar where the time required for the laser to be 
reflected from the scene is used as a measure of 
distance. Triangulation based range finders 
depend on the location of the reflected light to 
calculate the distance. A triangulation range 
finder developed in the Division of Electrical 
Engineering at the National Research Council was 
used in this research, and is described in (5). 

(A survey of range finding methods can be found 
in (6).) The elements in a range image are 
referred to as "surfels" or surface elements, and 
must be processed using algorithms designed 
specifically for this type of image. In some 
respects a range image is easier to analyze than 
intensity images because the information in this 
type of image is a function of only one scene 
property, namely depth. Pixels in intensity 
images, such as those from T.V. cameras, are 
dependent on reflectivity, lighting intensity and 
direction, colour, etc., making robust algorithms 
elusive for all but the most controlled 
environments. 

Range image processing research has centered 
around extraction of information contained in 
edges and/or surfaces. Although it can be argued 
that one leads to the other, the various methods 
for extracting edge features differ significantly 
from region detection. Each method has virtues, 
depending on the ultimate goal of the research. 
Edge finding methods have been used in (8) (9) 
(10). Gil et al., (11) used registered range and 
intensity images to build a more complete edge 
map. Planar and/or quadratic surfaces are 
extracted in (7) (12) (13) (14). An interesting 
approach to collecting the maximum information 
from the image is to combine the edge and region 
detection methods. An operator for this purpose 
is described in (10). A more indepth survey of 
related literature can be found in (15). 
Generally, both edge and region finding methods 
have grown from extending and manipulating ideas 
used in intensity images. A critical assumption 
which can be made about range images has been 
ignored. The information in a range image is not 
predominantly in the edges or the regions, but in 
the three dimensional shape represented by the 
image data. Instead of processing range images 
with edge or region operators, we suggest that 
shapes should be used. These shapes exist in the 
same coordinate system as the image data making 
the concept of shape transformations easy to 
visualize. A mathematics of shape transformation 
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called mathematical morphology is described in 
the following section. 


III. MATHEMATICAL MORPHOLOGY 

Mathematical morphology is the algebraic 
treatment of shapes. A transformation based on a 
shape (sometimes referred to as a structuring 
element) is surprisingly easy to understand with 
the use of a simple visual example. There are 
two classical transformations that are most often 
used; erosion and dilation. Consider first the 
two dimensional, binary image application of 
these operations. What does it mean to erode and 
dilate a binary image with a shape? To 
illustrate the answer to this question we have 
inserted three figures of watch gears. Fig. 1 is 
the original binary image, Fig. 2 is a dilated 
image, and Fig. 3 is an eroded image. Notice the 
small disk in Fig. 1. This shape is used to 
dilate and erode the image into Figs 2 and 3 
respectively. The disk is not a part of the 
image, but is placed in the figure to help the 
reader visualize the transformations. In the 
following textual description. Fig. 1 is the 
original image, and Fig. 2 is the resultant 
image. To perform a dilation of the original 
image, place the structuring element (in this 
case a disk) so that the centre point falls on a 
pixel i, j in the original image. If the value 
of the pixel i, j in the original image is 1, 
then each pixel which is covered by the 
structuring element will become 1 in the 
resultant image. This step is performed for all 
pixels in the image. The reader is encouraged to 
verify that this is indeed how Fig. 2 resulted 
from Fig. 1. 



Fig. 1 Binary image of watch gears. 
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Fig. 2 Dilated watch gears. 



The erosion transformation can be visualized 
in two ways. If the dilation is clearly 
understood, then the erosion transformation can 
be considered a dilation of the background, 
i.e., dilate the O's in the image instead of the 
1 1 s. Notice that Fig. 3 is the result of eroding 
Fig. 1. with the same structuring element used 
previously. The second way to understand an 
erosion, is fundamentally the same as the 
description of the dilation. Place the 
structuring element at each pixel i, j in the 
original image. If the value of pixel i, j is 0 
then, each pixel which is covered by the 
structuring element will become 0 in the 
resultant image. 

The structuring element of a small disk was 
used in this explanation to demonstrate the 
utility of such a transform in an industrial 
application. In this case, a priori knowledge of 
the size of the gear and spacing of the gear 
teeth allowed the choice of the correct size 
structuring element which could be used to 
identify the broken teeth on the gears. This is 
an example of a model driven transform, where the 
model was a priori knowledge of the size and 
shape of the gear. 

Next consider what it means to transform a 
greyscale image, either range or intensity, with 
a three dimensional shape such as a ball. To 
facilitate this description we point out that a 
grey scale image can be thought of as a function 
f(x,y) on the points of Euclidean 2-space. In 
3-space a grey scale image is a set of points 
x,y, f(x,y) where f(x,y) is the pixel value. In 
a range image f(x,y) = z since the image is a 
three dimentional representation of the scene. 
This can be visualized as a thin, not necessarily 
continuous sheet. Fig. 4 is an oblique graphical 
representation of a range image, demonstrating 
how an image can be thought of as a sheet. (This 
image is further discussed in the following 
section.) 

Greyscale erosions and dilations on these sheets 
are most often used in conjunction with each 
other. An erosion followed by a dilation is 
called an opening, and a dilation followed by an 
erosion is called a closing. The opening or 
closing of a sheet by a geometrical solid is a 
grey scale transformation which treats different 
portions of an image uniquely, depending on how 
well the local grey level topology is matched by 
the shape. 

Fig. 5 illustrates a greyscale closing 
operation, using a ball. The ball is rolled over 
every pixel (surfel) on the image. Where the 
topology of the sheet prevents the ball from 
touching each pixel, the value of the pixel is 
raised to the surface of the ball. A greyscale 
opening is the mathematical dual. The ball is 
rolled on the underside of the sheet, and the 


Fig. 3 Eroded watch gears. 
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pixel values are lowered to touch the ball. (To 
see this turn Fig. 5 upside down.) 

Fig. 6(a), 6(b), and 6(c) are images of a 
man's face, where (a) is the original image, (b) 
is closed using a ball, and (c) is the residue 
image, i.e.. Fig. 6(a) - Fig. 6(b). Fig. 6(b) is 
a version of Fig.(a) with some information 
removed. The removed information is considered 
to be where the ball would not fit into the sheet 
that represents the original image. A pixel wise 
subtract of Fig. 6(a) - Fig.6(b) actually 
extracts this information and is shown in Fig. 
6(c). This type of residue image is a common 
method of extracting useful information in 
industrial machine vision algorithms. 

The power of these openings and closings is 
realized in the fact that these operations can be 
performed in real time using any shape and size 
of that shape that the application requires. 


Fig. 4 Oblique graphical representation of a range 
image demonstrating how an image can be thought of 
a sheet. 

IV. MORPHOLOGICAL PROCESSING OF RANGE IMAGES 

Three dimensional information is explicit in 
a range image. Applying three dimensional 
operators to this data is demonstrated as a 
useful approach to transforming three dimensional 
data. Transforming data in the same coordinate 
system in which it is originally represented 
becomes feasible for real time applications with 
the advent of specialized hardware capable of 
3-D morphology. The ability to choose the size 
and shape of the operator used to transform the 
image implies some a priori knowledge. Depending 
on the objective of the processing, this a priori 
knowledge may be a low level of characterization of 


the image properties, such as signal to noise 
ratio, up to a complete 3-D model of what is 
expected in the scene. To illustrate this we 
examine the application of component placement 
verification on printed circuit boards. With 
automated component placement, it is necessary to 
inspect the boards visually to ensure that the 
components are present and properly aligned. Fig. 
4 is a graphical representation of a range image 
that shows components placed on a printed circuit 
board. This image is shown in Fig. 7(a) 
displayed as intensity, i.e., the bright areas 
are nearer the viewer than the dark background. 

If the range imaging process were ideal, an exact 
3-D representation of the scene would be 
contained in the data. There are some clearly 
defined reasons why this is not the case (5). 

The shadow effect, and specular reflectance, are 
inherent in this technology and cause errors in 
the data as can be seen in Fig. 4 and Fig. 7(a). 
It is often the case that the errors are 
considerably smaller than the objects or features 
of interest to the application. This presents 
the familiar problem of processing the image to 
remove the error that is relatively small, while 
preserving the information of interest. 

Typically this low level processing would be 
followed by model driven, higher level 
processing. A single morphological operation can 
accomplish both of these tasks. 

Returing to the component inspection 
application; we know a priori the size of the 
smallest component expected in the scene. Thus, 
we can choose a structuring element that is known 
will fit inside the data representing the 
smallest component expected in the scene. Using 
this structuring element, a 3-D morphological 
closing is performed, which in effect removes all 
peaks in the data that are smaller than the 
smallest object of interest. Fig. 7(b) is the 
result of this closing operation. A threshold is 
used to create a binary image from this processed 
image. Fig. 7(c) is the binary image 
representing the location of the components. 

Well known connectivity analysis of this binary 
image quantifies the results which can be 
compared to the expected data. 

This processing is so simple in concept that 
automatic image processing programming based on a 
priori knowledge becomes feasible for some highly 
structured environments. It is not difficult to 
generate the steps for the application described 
above, based on a simple data base representation 
of the printed circuit boards. It is less clear 
that this is useful for more loosely structured 
applications, but is worth considering one 
application at a time. 
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Fig. 7(a) Range image of a surface mount techno- Fig - 7(b ) The transformed image, using a morpholo- 
logy board, displayed as intensity. The bright gica1 closing with a structuring element that is 

areas are components, raised from the substrate. known to fit inside the smallest component. 
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V. CONCLUSION 

Mathematical morphology can be used to 
transform images in both data driven and model 
driven processes. Choosing shapes (structuring 
elements) for data driven transforms is usually 
based on characteristics of the imaging device, 
and may accomplish such things as noise removal, 
edges effect removal, etc. In the model driven 
part of the algorithm, shapes are chosen based on 
a priori knowledge of what is expected in the 
scene. The morphological transforms are applied 
using these shapes to locate expected attributes 
in the image. 

Morphological transformations are ideal for 
processing of range images. Industrial applica¬ 
tions for verification vision are plentiful, and 
using the methods described in this paper it is 
possible to totally automate the inspection task 
with sufficient a priori knowledge combined with 
highly reliable range images containing explicit 
3-D information. Although this scenario is the 
most appealing, there is a broad spectrum of 
applications varying on how much a priori know¬ 
ledge is available, and the quality and type of 
images acquired. Mathematical morphology is 
sufficiently general to be used at all levels of 
image and scene analysis over this spectrum. 
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ABSTRACT 

In the future intelligent mobile robots will be called 
upon to play many important roles. In many realistic 
situations, the knowledge of the structure and placement 
of objects in an environment should be learned rather 
than built in. Thus the mobile robot must often 
construct 3-dimensional models for the objects by 
analysing sensed multiple views. 

In this paper, we describe an approach to the 
incremental construction of 3-D body models in a 
practical office or warehouse environment by matching 
planned multiple views. In particular, we discuss the 
following aspects: 

1. the decomposition of a framed view and the 
construction of partial 3-D descriptions of the view: 

2. the matching of partial 3-D descriptions of a view 
with the built-in model of the robot environment: 

3. the matching of partial descriptions of bodies 

derived from the current framed view with partial 

models constructed from previous views: 

4. the identification of the new information in the 

current view and the updating of the models: 

5. the identification of the unknown parts of the 

models which are being constructed so that further 

vantage viewpoints can be planned. 

This approach combines such intelligent robot functions 
as attention, planning, sensing, learning and knowledge 
rectification. A prototype system for matching and 
constructing 3-D body models has been implemented 
and tested with synthesized images using C-PROLOG 
under Berkeley UNIX on a VAX 11 750. 


INTRODUCTION 

In the future computer-controlled robots will be called 
upon to play many important roles in industrial, 
business and domestic situations. If these robots are to 
work in complex environments it will be necessary to 
develop know ledge-based sensory systems. In simple 
situations, the robot vision system can have built-in 
models of both the environment and all objects within 
it: this allows a relatively simple recognition process. In 
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more realistic situations, however, although the geometry 
of the surrounding environment may be known (i.e. the 
dimensions of the room, warehouse, etc, in which the 
robot operates), the type and position of the objects in 
the environment will generally be unknown. Thus 
knowledge of the structure and placement of these 
objects must be learned. To do this the mobile robot 
must first construct 3-dimensional models for the 
objects it encounters. It should then be possible to 
classify these objects by comparing their structural 
properties with those of generally known classes of 
objects such as benches, chairs, tables, etc. 

In analysing a single framed view of part of a large 
scene, the problems which will generally stand in the 
way of constructing the 3-D body models include: 

1 partial features: 

2. self-occlusion: 

3. occlusion: 

4. accidental alignment and special alignment: 

5. undetermined geometric parameters. 

An approach to understanding a scene from image 
sequences by incrementallv constructing body models 
seems promising. However, even to-day. the information 
processing load involved in analysing a sequence of 
images presents a serious technical problem. Dynamic 
selection of a minimal set of vantage viewpoints and 
effective selection of only the necessary information will 
be essential if the burden of computation is to be 
lightened. Fortunately, a mobile robot, by its nature, 
offers a good foundation for gathering information from 
different points of view. Thus combining a vision 
system with a planner, so that a scene can be analysed 
from planned multiple views, is both natural and 
necessary. 

In this paper. we describe a system which 
incrementally constructs 3-D object models of an office 
or warehouse scene from planned multiple views. In 
particular, we address the matching and construction of 
3-D partial models. 

To limit the scope of the immediate research problem 
the following assumptions have been made: 

1. The bodies in the environment are static, rigid, 
weakly externally visible, and have vertices formed by 
at most three surfaces. Edges are formed by two 
surfaces, which can be planar, conical, cylindrical or 
spherical. 
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2. The only lights in the environment are one point 
source and one diffuse source. 

3. The shape and dimensions of the robot environment 
are given. 

4. A pinhole spherical camera model is used to acquire 
the images. 

5. There is a preprocessor which deals with early and 
intermediate vision processing of the visual data. The 
output of the preprocessor is equivalent to a complete 2 
1/2 D sketch. The categories of facets and lines, the 
orientations of planes and the rough depths of junctions 
have already been extracted from the 2 1/2 D sketch. 

Partially constructed models which remain incomplete 
after a sequence of views are analysed by a viewpoint 
planning system, which is described in a companion 
paper 1 ; using this system new views can be chosen to 
resolve the ambiguities. In any realistic situation, we 
would expect that the task assigned to the robot would 
also provide input to the planning system so that a 
decision could be made to ignore incomplete objects 
which were irrelevant to the current task. 

The body models (either partial or complete) which 
are constructed from the multiple views have Boundary 
Representation (BR) like representations. Once a 
complete model has been constructed, a rule based 
conversion system which has been described elsewhere 2 
is used to transform the BR representation into our 
new "Constructs Solid Geometry - Extanded Enhanced 
Spherical Image" (CSG-EESI) representation which 
provides both structural and geometric information for 
the bodies. Higher level 3-D models can be more easily 
derived from the CSG-EESI representation. This 
facilitates object classification by comparing a structure 
with those for prototype objects which might be 
expected to be in the environment. 

BACKGROUND 

There has been considerable research on the 
segmentation and labeling of images. After Guzman 3 , 
Huffman 4 , Waltz 5 and Turner 6 , Chakravarty 7 generalized 
a line and junction labeling scheme that deals with 
planar-faced or curved-surface solid bodies, having 
vertices formed by three surfaces. In this scheme, 3 
types of lines and 8 types of junctions were defined. By 
dealing with regions and lines, objects can be correctly 
labeled by the set of junction types. 

An object must often be observed from several 
directions in order to form an assessment of what it 
looks like or to form a 3-D model of it. In order to 
form a 3-D model of an object from sequential views, 
Underwood and Coates 8 developed a program which 
forms a 3-D description of a planar convex object when 
the object is rotated in space. In their match 
algorithm, the connections between surfaces, the number 
of edges which bound a surface and the clockwise 
ordering of edges form the deterministic factors. Later, 
Preiss 9 described an approach which interprets a 
standard engineering drawing of a planar object for 
construction of its 3-D representation. This approach 
consisted of three main steps: 
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1. interpretation of projected faces, 

2. interpretation of dashed lines, 

3. assembling them into a body. 

The connectedness properties and the geometric 
relationships between junctions or faces (such as 
coplanar relationships), were used for matching 
junctions. In this approach, the final 3-D representation 
of a body is complete, and consists of each of its faces, 
edges and vertices along with the three coordinates of 
each vertex. 

Several researchers have investigated the problem of 
matching multiple views of a block’s world in front of 
a featureless background. Using wide-angle stereo, 
Ganapathy 10 designed a scheme which uses some 
heuristic rules, such as "Single Match", "Order Match", 
"Connectivity" and "Table Match", to choose an initial 
match between corresponding vertices in order to reduce 
the search space. His program finally stops after 
building up the 3-D coordinates of the vertices. 
Shapira and Freeman 11, 12 developed a program for 
constructing a description of solid bodies from a set of 
pictures taken from different vantage points. A 
heuristic procedure was devised for establishing matches 
between junctions in the different pictures and 
determining the validity of doubtful junctions. It first 
establishes matches among junctions by using the 
constraints of projection and the connectedness between 
junctions; then it establishes matches among lines by 
using the cyclic-order property and fills in missing 
connections between junctions and missing junctions. 
The final description reported by the program involves 
bodies made up of their face groups which are 
described in terms of triples of matched lines. Asada 13 
developed a system which describes 3-D motions of 
jointed trihedral blocks. In this system, a Huffman-like 
labeling scheme and an object-to-object matching 
method are first used to segment the line-drawing 
images into individual blocks and to find the possible 
correspondence of their junctions between closely 
consecutive frames. A transition table of junction labels 
and contextual information is used to analyse structural 
changes of the line drawings. Then, the shape rigidness 
property of three vertices on a block is used to 
evaluate geometrical parameters, such as the 3-D 
coordinates of the vertices and motion parameters. 

It is only in recent years that attempts have been 
made to match multiple views in a complex 
environment in order to incrementally construct some 
kind of model of a scene. Herman, Kanade and 
Kuroe 14 described the 3-D MOSAIC project whose goal 
is to incrementally acquire a 3-D model of an urban 
scene from images. Their method is to first extract 3- 
D shape information from the images by stereo analysis, 
then to match two views based on junction matching 
and finally to generate an approximate model of the 
scene by using task-specific knowledge. Crowley 15 
described a navigation system for an intelligent mobile 
robot which included techniques for the construction of 
a line segment description of a recent sensor scan and 
the integration of such descriptions to built up a model 
of the immediate environment using a list of directed 
line segments. Herman 16 described an algorithm which 
matches vertices in two 3-D descriptions. The algorithm 
consists of three main steps: first, initial matches are 
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obtained for each vertex based on local properties; next, 
a Waltz-filtering procedure is applied which propagates 
topological constraints to reduce the set of matches; 
finally, a tree-type search which uses both topological 
and geometrical constraints gives globally consistent sets 
of unique matches. 

OVERVIEW OF THE SYSTEM 

The system which we have developed for 
incrementally constructing 3-D models of objects is 
illustrated schematically in Figure 1. 

The incremental construction of object models from 


planned multiple views involves the following principal 
elements: 

1. the decomposition of a framed view and the 

construction of partial 3-D descriptions of the view; 

2. the matching of partial 3-D descriptions of a view 
with the built-in model of the robot environment; 

3. the matching of partial descriptions of bodies 

derived from the current framed view with those partial 
models constructed from the previous views; 

4. the identification of the new information in the 

current view and the updating of the models; 

5. the identification of the unknown parts of the 

models which are being constructed so that further 
vantage viewpoints can be planned; 

6. the Finding of the relationships between bodies and 
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their environment and constructing a partial map of the 
scene. 

In the following sections we discuss elements 1 - 5 in 
turn. Each of these elements is more or less related to 
matching. In our approach, matching is based on the 
rules (contraints) derived from geometry, topology, 
photometries, triangulation and problem assumptions. 

Strategies for recognition can be data-directed 
(bottom-up), knowledge-directed (top-down) or some 
mixture of the two. In our approach, a bottom-up and 
top-down mixed strategy is used to match partial 3-D 
descriptions of a view with a built-in model of the 
robot environment and a data-directed strategy is used 
to incrementally construct 3-D body models for the 
objects in the environment. We are interested in 
exploring how far we can go with a data-directed 
strategy. 

In the scene learning process the robot vision system 
generally produces incomplete and erroneous knowledge 
of the objects in the environment; consequently, it is 
important to identify the unknown parts, to rectify the 
erroneous knowledge and to assimilate the new 
information. Our approach combines such functions of 
an intelligent robot as attention, planning, sensing, 
learning and knowledge rectification. 

DECOMPOSITION 

For each view, the partial 3-D descriptions of bodies 
are derived by labeling and segmenting the image. In 
general, the decomposition process first merges the 
regions that are separated by shadow lines, and then it 
labels the junctions and lines in the image. After 
labeling, internal representations are created for real 
vertices, edges and faces. At that time, those edges 
separated by virtual junctions are united, and the partly 
viewed edges and faces are identified. In the last step, 
faces are combined on internal edges to form bodies. 
Some relationships between bodies, such as "touch by” 
or "occluded by" will also be identified. Thus, 
hierarchical internal representations which are used as 
partial 3-D descriptions are constructed for each body 
in a view. 

Starting from the Chakravarty and Waltz labeling 
schemes, a modified and extended labeling scheme has 
been devised for labeling scenes containing shadows and 
certain curved objects. The images are labelled by an 
expert system; knowledge of label (production rules) is 
stored separately in micro-knowledge-bases according to 
the categories of related junctions. The top level of 
the expert system controls the sequence of the labels. 
It first arranges the addresses of the junctions that 
need to be labeled in an "ORDER QUEUE". The 
junctions which are generally easier to label, such as 
"p", "w" and "y" types, are arranged at the front of 
"ORDER QUEUE". When a junction and its related 
lines have been successfully labeled, the junction is 
deleted from the "ORDER QUEUE". Meanwhile, its 
immediatly adjacent junctions will be inserted in a 
"PRIOR QUEUE". The top-level expert then 
propagates the labeled junctions to their immediate 
neighbours, thus the label procedure can take advantage 
of derived facts and the labeling time can be reduced. 
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The second level of the expert system selects the 
appropriate micro-knowledge-base, according to the 
category of the junction given by the top-level expert, 
and sequentially selects the production rules from the 
selected micro-knowledge-base in order to label the 
junction and its related lines. At the lowest level of 
the system are the processes which carry out the 
following tasks: 

1. fetch the related facts from the appropriate micro- 
databases; 

2. match the facts with the condition of a rule; 

3. incorporate the label results into the appropriate 
micro-databases; 

4. add the addresses of the immediate neighbours of 
the successfully labeled junction into the "PRIOR 
QUEUE". 

The decomposition is conservative, i.e., it favours the 
separation of objects on concave edges. For a curve on 
which there is no feature point, a label is given 
according to its convexity: a concave curve is assigned 
as a concave boundary and a convex curve is assigned 
as a convex internal edge. The errors caused by an 

incorrect decomposition are expected to be rectified by 
facts collected later or by the knowledge stored at 

higher levels. 

MATCHING THE ENVIRONMENT MODEL 

The initial location of the robot in the environment is 
not known a priori. In order to determine the initial 
coordinates of the robot in a fixed coordinate system 
keyed to the environment, it is necessary to identify 

which entities in a framed view correspond to parts of 

the environment model. For this purpose, at least some 
edges of an entity should be matched with a connected 
part of the environment. Edges are stable, relative 

features which contain dimensional information and are 
at the lowest level (except for vertices). Since the 
geometry (shape and dimensions) of the environment are 
known, an edge-based matching process has been 

devised for matching the environment. 

The process first matches the completely visible real 
edges of entities from a framed view with the built-in 
environment model; this is done according to their 

attributes, the categories (e.g. planar or conical) and the 
directions of their adjacent faces. The edge attributes 
consist of: 

1. category, e. g., straight line, circle or other curve; 

2. type, e.g., shadow, occluding boundary, concave 

internal, convex internal, concave boundary, clipping line 
or limb; 

3. convexity; 

4. approximate length. 

If an edge in a view is matched with several edges in 
the environment model, then a "matching confidence" 
will be assigned to it which is proportional to the 
inverse of the number of matched pairs. An entity is 
considered to be a candidate for part of the 
environment model if at least its visible and well 
labelled internal and occluding edges match with edges 
in the environment model. From these candidates, 
entities will be designated as being parts of the 
environment on the basis of the following properties: 

1. at least two matched edges; 
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2. a maximum number of matched edges; 

3. a maximal sum of confidences for matched edges. 

Following this identification, a top-down analysis 
process, which propagates the matched facts according 
to the built-in model of the environment, will be 
applied to those entities in order to: 

1. Further verify the matched facts and Find more 
matching facts. If in the propagation an inconsistent 
fact is discovered, then the initially matched entity will 
be rejected. 

2. Identify the matched vertices and determine the 
position of the current viewpoint of the robot. 

3. Rectify the result of the decomposition of the 
current view. When those concave edges, which were 
initially labeled as the concave boundaries, are revealed 
as the concave internal edges of the environment, their 
labels are revised and the corresponding bodies are 
merged together. 

Since the 3-D coordinates of two known feature 
points and their spherical coordinates in a view can be 
used to determine the position of a viewpoint, the 
position can be determined from any two matched 
edges. From a pair of matched edges and the 
approximate depths of their related points (e.g. end 
points), the process identifies two pairs of the best 
matching points. Since the 3-D coordinates of the 
environment points are known, the possible position of 
the current viewpoint can be calculated from the two 
pairs of points. After another pair of matching edges 
has been discovered by the matching propagation 
procedure or when the other pair of matched edges is 
used for propagation, the new facts will be used to 
confirm or rectify the position of the viewpoint and 
make it more precise. 

MATCHING PARTIAL BODY MODELS 

Once the environment model has been matched, the 
partial 3-D descriptions from the first view will be used 
as the initially constructed partial models of the bodies 
in the scene. 

In order to match the partial descriptions of bodies 

derived from a new view with those partial models 
constructed in the previous views, a multi-level feature 
matching approach has been used. This approach First 
matches the partially constructed models to those 3-D 
descriptions in the current view by selecting those 
reference vertices from the object models and the 
environment model which have the following features: 

1. they are valid vertices or Shadow Intersection 

Points (SIP); 

2. they are within the new view frame (though 

sometimes they may be occluded and unseen); 

3. for each reference vertex, either the directions of 
the two constituent faces are known or the projection 
of the vertex is a boundary point of an unknown area 
in the current constructed map; 

4. they are related to the objects of current interest. 

Following the selection of the reference vertices a 

prediction process determines the possible matching 
windows in the new view; this is done on the basis of 
the approximate position of the new viewpoint and the 
coordinates of the reference vertices which may only 
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have approximate values stored in the models. The 
process finds the candidates for matching in the new 
view which are located within the matching windows. 
The widow sizes are determined by the tolerance errors 
of the robot movement, the errors of the coordinates of 
the refernce points and the relative positions of the new 
viewpoint and the reference points. 

In the knowledge base, there is a "Junction Family 
Dictionary". In the dictionary, each family consists of 
the possible junction types for a specific kind of vertex, 
when it is viewed from different positions. Using the 
Junction Family Dictionary, and the categories and 
directions of the constituant faces, the matching process 
assigns each candidate a confidence. The candidate 
which has the uniquely highest conFidence will be 
chosen as the matched vertex for a reference vertex, 
and its corresponding faces will be considered as 
matched faces. After finding a matched pair of vertices, 
the matching process propogates the fact along the 
emanating edges to adjacent vertices. A depth-first 
search is used at each pair of matched vertices, and 
when partial edges, unmatched vertices caused by 
occlusion or already matched facts appear, the ’ match 
propagation for that direction will stop. Thus, different 
levels of features (i.e. faces, edges and vertices) can be 
matched in the same propagation process. 

For those constructed body models which do not have 
any vertex or feature point (e.g. SIP), the edges and 
faces will be chosen as the basic matching elements. 
According to their categories, types, shape parameters 
and approximate positions, the corresponding feature 
elements can be found. Also the matched facts will be 
propagated to their related feature elements. 

After the faces have been matched, matched bodies can 
be derived from these faces. Following this, the model 
updating process searches the matched facts starting 
with high level features and moving to low level 
features; the low level features which do not exist in 
the body model are now filled in by the known parts 
of the current 3-D body description which are matched. 
After matching the partially constructed models to those 
3-D descriptions in a new view, unmatched bodies in 
the new view are identified. In order to test whether 
these are newly discovered bodies, a reverse direction 
matching process is used to check whether any vertex 
in an unmatched body has a corresponding vertex in 
the body models or the environment model. If this is 
not the case, then the body is new and is added to the 
database of the body models; otherwise the appropriate 
m /ched model will be found. Meanwhile, features 

separated in the new view or in the constructed models 
may be merged into one if their correspondence is 
unique in one of the two 3-D representations. The 
related revision will also be done. 

POSITION ADJUSTMENT 

In practice, data gathered by a robot vision system 
always includes certain tolerance errors. Although the 
relative positions of the viewpoints can be derived from 
a robot servo system, this information is generally 
imprecise. Since any two views form a pair of wide 
angle stereo images, the matching process provides the 
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information necessary to calculate the position of the 
sensor (the robot camera) quite precisely. This 
information can be used to correct the position 
calculated by the movement control servos and used for 
dynamically adjusting the robot movement. 


Figure 2 shows two synthesized views from the scene 
shown in Figure 3; they have been successfully analyzed 
by the system described above. 


IDENTIFICATION OF UNKNOWN PARTS 

The ambiguities caused by special alignments and 
accidental alignments generally can be distinguished by 
using multiple views. For example, when a strange 
junction type occurs in a view, if from other views the 
matched points belong to the same family of junctions, 
then it is caused by a special alignment, otherwise it is 
caused by accidental alignment. The ambiguity caused 
by accidental alignments can be ignored. For a special 
alignment, the ambiguity can often be solved by a 
correct decomposition. though sometimes higher 
knowledge of the scene may be needed. 

Inside a body, self-occlusion may result in unknown 
occluded parts. Between bodies, an occlusion may cause 
the occluded bodies to be unidentified. For these two 
cases, unknown parts only occur at the occluding edges. 
Besides, a concave boundary edge indicates that the two 
related bodies are touching, and hence the touching 
parts cannot be seen if there is no means to change 

the status of the bodies. 

In the system described here, when a model of a 
body has been created, only the internal, occluding and 
concave boundary edges which are the real edges of the 
body are created. The model also contains a list of its 
boundary edges and a list of the bodies which occlude 
it. When a newly discovered surface is added into a 
body model, it is necessary to change the types of 

those occluding edges in the body model, which are 

matched with the edges of the added surface. These 
edges become the internal edges of the body and are 

deleted from the boundary list of the partial model of 
the body. Thus boundary occluding edges of a body 
model always indicate the self-occlusion of parts and 
the need for further attention. 

The "t" type junctions caused by occlusion are kept in 
the input image databases. Although they are not the 
vertices of a body, they are important points for the 
construction of a map of the scene and for discovering 
the unknown parts caused by occlusion. For an 

occluded body, discovering its occluded parts is 
accompanied by a search for its "t" type points and 
those incompletely seen edges and surfaces which relate 
to the "t" type points. 

All of the above outcomes will be organized and 
analysed by a view planning system in order to further 
resolve the ambiguities. This componant has not yet 
been developed and implemented. 

EXPERIENCE 

The system for matching and constructing 3-D body 
models has been implemented by using C-PROLOG 
under the UNIX operating system on a VAX 11/750. 
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Figure 2 

The two synthesized views which have been 
successfully analyzed by the described system. 



An example of a typical scene (two views of this 
scene are used in Figure 2). 
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CONCLUSION 

Under the assumptions described in the introduction, 
the system described here can incrementally construct 3- 
D body models in an office or warehouse environment 
by matching planned multiple views. No prior 

knowledge of the objects is required by this system. 
The system includes the following important features: 

1. a framed view is decomposed and partial 3-D 

descriptions of the view are constructed; 

2. partial 3-D descriptions of a view are matched with 
the built-in model of the robot environment; 

3. partial descriptions of bodies derived from the 

current framed view are matched with those partial 
models constructed from the previous views; 

4. the new information in the current view is 

identified and the models are updated; 

5. the unknown parts of the models which are being 
constructed are identified so that further vantage 
viewpoints can be planned. 

Together with a view planning system 1 and the CSG- 
EESI 3-D model conversion system 2 , this system offers 
a good basis for constructing a higher level image 
understanding system for an intelligent robot. 

As noted above, the system has been implemented in 
C-PROLOG under the UNIX operating system on a 
VAX 11/750, and has been tested successfully with 
synthesized images. While C-PROLOG provides a good 
environment for testing the ideas used in this system, 
any practical implementation would have to be more 
efficient. 
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ABSTRACT 

In this study, we analyze the problem of detection 
of depressions or drop-offs in the automated 
guidance of roving robots. The proposed approach 
is based on the principle that if one is too near a 
depression, one is bound to see new information 
which initially was occluded. To exploit this 
principle, two steps are undertaken. The first 
step involves the derivation of the correspondence 
process to allow the vision system to relate a 
location of interest in a sequence of frames. The 
second step involves the development of methods to 
detect and identify, in this location of interest, 
the occluded information. 

RESUME 

Nous presentons dans cette etude une methode 
visuelle pour la detection de depression de terrain 
(creux, pente, bordure de pave, etc) dans le but 
d'assurer sauf-conduit pour un robot autonome. 

Cette methode est basee sur le principe que si on 
s'approche d'une depression de terrain, on est apt 
d'apercevoir certains elements d 1 information qui 
auparavant n'etaient pas apparents. Pour exploiter 
ce principe, deux etapes d*action sont prises. 

Dans la premiere etape, nous etablissont le rapport 
qui existe entre une serie de photographes pour 
permettre au robot de reconnaitre une meme location 
d'interet dans deux ou plusieurs de ces 
photographes. Ces photographes sont prises en 
succession et en s'approchant de cette location 
d'interet. Dans la deuxieme etape, nous decrivont 
les techniques necessaires pour identifier et 
extraire les elements d'information qui 
caracterisent la presence d'une depression de 
terrain. 

INTRODUCTION 

Depressions or drop-offs constitute a serious 
problem in the automated guidance of roving 
robots. Unfortunately, the detection of depres¬ 
sions is also a complex image analysis problem. In 
the human vision system, many visual cues such as 
stereopsis, occlusion cues, context in the viewed 
scene, change in textural properties, etc., are all 
interpreted and integrated with relative ease to 
yield an almost effortless perception of what, in 
fact, is a complex perceptual task. In image 
processing, however, a computer implementation 
exploiting any one of the aforementioned cues 
becomes a complex information processing problem. 


Clearly, there is no simple way to solve this 
problem. In the approach proposed here, the aim is 
to extract the occluded information given a 
sequence of frames based on methods which allow for 
a relaxed image correspondence process between 
these frames. The two methods devised here make 
use of intensity profiles or pixel intensity dis¬ 
tributions. The first method identifies primary 
cues which suggest the presence of a depression. 

The second method extracts the occluded information 
to confirm the presence of a depression. 

Before we describe the approach for the detec¬ 
tion of depressions, we first take a broad view of 
the automated guidance of roving robots. In this 
view, with the aim to extend beyond the ideal set¬ 
tings generally considered, we make an assessment 
of real-world scenes identifying pertinent problems 
towards enhanced guidance of roving robots. 

SCENE INTERPRETATION PROCESS 

Figure 1 illustrates a process of analysis and 
interpretation of real-world scenes. In this 
process, the first function of the vision system is 
to provide the robot with a safety path. To carry 
out this function, the vision system uses the 
first-pass evaluation process 1 which exploits the 
surface consistency constraint 2 by comparing the 
environment ahead of the robot with an initial 
environment which is already determined to be 
obstacle-free. Computer results of an implementa¬ 
tion of this first-pass evaluation on outdoor 
scenes are shown in Figure 2. The second function 
of the vision system is to provide the robot with 
the needed additional information in the event 
where an object blocking the path of travel is 
detected, or if some landmark need to be identi¬ 
fied. To carry out this second function, the 
objects the robot is likely to encounter are cate¬ 
gorized, and their essential visual characteristics 
are identified. In the process of Figure 1, these 
categories are: (1) shadows (false alarm), 

(2) depressions, (3) upright objects, and (4) flat 
objects. The essential features characterizing the 
above categories are: 

1. Shadow . A surface upon which a shadow is cast 
will preserve its intrinsic physical character¬ 
istics. This is due to the relatively uniform 
effect of shadow on the image gray level 
intensities. 1 ' 4 
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2. Depressions . In approaching a depression, one 
is bound to see new information that was pre¬ 
viously occluded. We call this the occluded 
information. 

3. Upright objects . Most man-made upright objects 
have straight vertical edges. The few other 
man-made objects together with the natural 
objects which do not have straight vertical 
edges can be distinguished by the manner in 
which they project onto the two-dimensional 

(2-D) image plane. An upright object projects 
in the 2-D image plane proportionally to the 
depth of field it occludes. 5 

4. Flat objects . Flat objects are affected by 
perspective. Also, a flat object projects in 
the 2-D image plane proportionally to its 
length. 5 

A methodology for guiding the robot through a 
given scene can be: 

1. As an initial step, the vision system takes 
left, front and right images of the scene to 
acquire a wide-angle view. Each image is 
analyzed using the first-pass evaluation 
process. The results obtained from the three 
images are integrated to yield an optimal 
tracing of the safety path. We refer to this 
step as the initialization phase. 

2. The robot takes the optimal path, and the 
vision system is directed to enter what we 
refer to as the motion phase. In this motion 
phase, the vision system processes images in 
the chosen direction of travel. The wide-angle 
view is no longer necessary, unless an obstruc¬ 
tion is encountered and a new direction of 
travel must be taken. Moreover, the image 
taking process is a function of the range of 
the safety path. For example, if a path is 
obstacle-free for x steps, then an image may be 
taken every x^ steps, where x^ is the integer 
part of the fraction x/k and k=2,3,4,..« 
depending on how large x is. In this phase, 
essential safety path cues such as path clear, 
obstacle ahead, turn left/right, are provided 
in real-time, and the timing of the vision 
system is such that it is always processing a 
few steps ahead of the robot. 

3. If an object is found along the direction of 
travel, the vision system issues a warning 
signal to the robot and directs it to pause and 
takes another picture. The system then deter¬ 
mines the range and extent of the object, and 
provides the robot with the necessary avoidance 
cues. If identification of the object is 
desired, the system enters the identification 
process. We refer to this step as the warning/ 
identification phase. This phase can be 
carried either in a sequential mode or in a 
parallel mode. In the sequential mode, the 
processing task for which the primary cues can 
be found with the least amount of processing 
time is performed first. If the results are 
not conclusive, the next processing task is 
performed, and so on. In the parallel mode, 
all processing tasks are initiated simulta¬ 
neously, and execution of these tasks ends as 
the first primary cues are determined. 

In organizing all these information processing 
tasks to yield an integrated vision system, the 
following important points are considered: 


1. Implementation of a methodology and a decision 
making process to insure that an information 
processing task is initiated only if primary 
cues justify its execution? 

2. Allowing for concurrent processing in the 
development of these information processing 
tasks; 

3. Allowing for one processing task to call upon 
another processing task if ambiguities arise in 
the image interpretation results. 

We now describe the approach for the detection 
of depressions. This description starts with a 
presentation of the image correspondence process. 

IMAGE CORRESPONDENCE PROCESS 


For the image correspondence process, we 
derive equations for the correspondence, in both 
range and width, of any two image points, going 
from one frame to the next. But first, we need to 
define the mapping principles between the three- 
dimensional (3-D) real world and the 2-D image. 
Given Figure 3, using properties of similar trian¬ 
gles, measurements in width (W) and range (R) in 
the real-world environment are mapped in the (x,y) 
image coordinate system by the following 
relationships: 




f [R(y k ) +h tan a] + f[R(y k ) tan a - h] tan ($+a) 


[R(y k > + h tan a] tan(B+a) 


x. -x. = 
3 1 


f + [R(y ± ) -R(y 0 )] 


fW(x. ,x.) 
1 3 


(1) 

(2) 


where h is the camera height? f is the camera focal 
length? a is the camera tilt angle; and 3 = 
arctan(L/h), with L being the range between the 
camera and the first point viewed by the camera. 
Note that if we leta=0, Eq. (1) takes the simple 
form 


fh[R(y k )-L] 
Y k = LR(y k ) 


(3) 


1 . 


Range correspondence : The objective here is to 
find how two points, yjk and y^ 2 , in the 
vertical axis of the first frame map into 
points, yj^ and y| 2 , in the second frame. The 
difference in the range of the two frames is r^. 
The superscript, 1 or 2 denotes the frame iden¬ 
tity. Coordinate y^ is the point where the 
object area starts, and point y^ 2 is an arbi¬ 
trary point a small distance away from yj^. 

The distance between these two points depends 
on the extent of the detected object. For sim¬ 
plicity, let us assume that the tilt angle a of 
the camera plane is zero. To find the mapping 
between points (yL,yJ 2 ) and f we use 

Eq. (3) to obtain the following relationships: 


y 


1 
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where R(yj^) is estimated by the first-pass 
evaluation. Similarly, since y^ 2 is arbi¬ 
trarily chosen, its range R(y^ 2 ) is easily 
determined, and thus we obtain 


y *2 

2 

y *2 


fh [ R(y^ 2 ) -L] 

LR(y L> 

fh[R(y\ 2 ) - r 1 - L] 

L[R(y L } “ r i ] 


(6) 


(7) 


peak, we mean a point in the profile whose 
gray level value exceed the value P max 
given by the standard relation 6 

p max = Pp + 0,5 cfp 

for the same profile. Parameters pp and 
dp are the mean and standard deviation 
of the intensity profile, respectively. 

This method is used by the integrated vision system 
for determining the primary cues. 


Width correspondence : In order to save proces¬ 
sing time, it is useful to focus only on a 
certain width of the image where the object is 
detected. So, if we choose a segment of width 
delimited by x kl and x k2 in the first frame, we 
need to find their corresponding projections in 
the second frame. This correspondence is found 
using Eq. (2). By substituting R(y^) for f + 
[R(yi)-R(yo)1 r for the first frame, we have 


1 1 
X kl~ X k2 


fw( xli,xl 2 ) 

R(y U> 


For the second frame, we have 


2 2 
X kl" X k2 


R(y li ) _r i 


( 8 ) 


(9) 


Method 2 


Step 1: 


Obtain a few horizontal scans between 
points x^ and x£ 2 of the first frame 
starting at the vertical coordinate y]n. 

We call these intensity profiles 
where i denotes the i-th scan. Obtain 
similar scans from the second frame, using 
the coordinate y^ as the starting point 
and x kl and x£ 2 as the horizontal limits. 
We call these itensity profiles Pklk2^ i ^ # 


Step 2: As in step 2 of the previous method, 

locate a difference in the number of major 
peaks appearing in p vik2^ when compared 
to Pkik2^ i ^ > If peaks are found, occluded 
information is detected. 


Using these relations, the vision system can 
now relate a location of interest in any two 
distinct frames separated by an arbitrary range 
ri (see Figure 4). 

EXTRACTION OF THE OCCLUDED INFORMATION 

Occluded information that is revealed in a 
subsequent frame can be perceptually very deceptive 
(see Figure 5). This results from the fact that if 
we cannot locate the same point of reference in the 
two frames then we may conclude that there is no 
relationship between these two frames. This 
stresses the importance of a reference point from 
which the system starts to look for occluded infor¬ 
mation? in our analysis this reference point is 
chosen in the proximity of the object as indicated 
by the first pass evaluation. Moreover, the detec¬ 
tion of occluded information will disturb many of 
the physical relationships that previously existed 
between the various elements in the given scene. 
Utilizing these two ideas, we describe two simple 
methods to extract this occluded information. 


Method 1 


Step 1: Take a vertical scan from point y^ to 
point y^ 2 to generate a vertical pixel 
intensity distribution or profile. We 
call this profile p £ 1 ^ 2 « Similarly, 
generate a vertical pixel intensity 
profile, P^ lA2 , between yj^ and y^. 


Step 2: 


Locate the number of major disturbances 
(peaks) in the profile p £i£ 2 when compared 
with the peaks in p ]q£ 2 « If there exists 
a difference in the number of major dis¬ 
turbances in the two profiles, then this 
implies the detection of the occluded 
information. By major disturbance or 


This method confirms the results obtained using the 
previous method. 

Computer examples of this procedure are illus¬ 
trated in Figure 6. Note, that when no occluded 
information is found in this analysis, the object 
remains a potential obstacle. 

CONCLUSION 

We described in this study an approach for the 
detection of depressions. This approach is based 
on finding occluded information from a sequence of 
frames. We noted that if this occluded information 
was not found, the object in question remains a 
potential obstacle, and the appropriate processing 
task is initiated to identify its nature. An 
attractive feature of this approach is that the 
methods used for the extraction of occluded infor¬ 
mation allow for a relaxed image correspondence 
process. For example, if a given peak in the 
intensity profile on an initial frame is missing in 
the corresponding intensity profile of a subsequent 
frame, it becomes sufficient to look for a distur¬ 
bance in these peaks going from one frame to the 
next. A computer implementation of this approach 
on real-world scenes produced very good results. 
Moreover, in the first part of this study, we 
discussed an image interpretation process which 
identifies pertinent problems towards enhanced 
guidance of roving robots. 
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Figure 1. Process of Analysis of Real-World Scenes 



Figure 2. Results of the First-Pass Evaluation 
on Two Outdoor Scenes 


Figure 3. Mapping of Real-World 
Measurements onto the 
Image Plane 
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Figure 4. Image Correspondence for Extracting Occluded Information. 

(a) Correspondence in Range; (b) Correspondence in Width; 
(c) Mapping the Reference Points; (d) Mapping the Location 
of Interest. 
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(a) Input Image A 


(b) Closer Range of Input Image A 


Figure 5. Need for a Point of Reference to Extract Occluded Information 




(a) (b) 

Figure 6. Extraction of Occluded Information, Input Images and their 

(a) corresponding vertical intensity profiles with occluded information 
revealed in the close range frame; 

(b) corresponding horizontal intensity profiles with occluded information 
revealed in the close range frame. 
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A COMPUTATIONAL THEORY OF 3D SHAPE RECONSTRUCTION 

FROM IMAGE CONTOURS 


Ping Liang and John S. Todhunter 
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ABSTRACT 

The computational theory of 3D shape reconstruction 
from image contours proposed in this paper is based on the 
variational principles and has the theoretical framework 
suggested recently by Poggio [8]. By this theory, 3D shapes 
can be reconstructed from image contours by general physical 
constraint assumptions, namely the assumptions of minimum 
potential energy, isotropism and homogeneity of the material, 
and properly defined energy functionals. It is assumed that 
contours have been classified as surface discontinuity boundary 
contours, surface contours and extremal boundaries. 
Minimization of the energy functionals tends to maximize the 
symmetry and orthogonalize the surface junctions of the 
reconstructed object Some early findings are obtained as the 
natural results of the theory. Theoretical developments and 
experimental results are successful and promising. 

EXTENDED SUMMARY OF THE THEORY 

One important function of early vision is the 
reconstruction of a 3D representation of a scene from 2D 
images. Stereopsis and structure from motion are the most 
explored in vision studies. Stereopsis and structure from 
motion require multiple images, but humans have the ability 
to perceive (with illusion) the 3D environment with only one 
eye or from a single picture. 

There exist many sources of information about surfaces 
in an image such as texture, shading, shadow, etc. [9,12,13,17], 
but those methods are only applicable for certain special 
situations. It has been shown that shape reconstruction from 
contours is significantly more powerful than shape 
reconstruction from textures [7]. Barrow and Tenenbaum 
[11] argued that shape reconstruction fTom boundary contours 
is of fundamental importance in explaining surface perception 
and more important than shape reconstruction from shading. 
Steven [14] showed that surface contours also play an 
important role in shape reconstruction from image. 

Theoretical studies [8] showed that the computational, 
ill-posed nature of early vision problems leads naturally to the 
application of the mathematical theory of regularizing ill- 
posed problems for solving them in terms of variational 
principles that enforce general physical constraints derived 
from a physical analysis of the problem. The constraints 
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should be derived as a natural consequence of the physical 
laws governing the world we live in. The results of this 
research will not only impact our understanding of the early 
visual system in biological organisms, but also lead to 
development of computational algorithms and hardware designs 
for machine vision. 

The fact that the human visual system has definite and 
consistent interpretations of contour images shows that it 

exploits some implicit assumption about the world. Without 
any knowledge of the nature of the process which generated 
the 3D shape, it is reasonable to assume that the given 2D 
contour is most likely to correspond to the projectively 
equivalent 3D shape with minimum potential energy. It is 
well known from classical mechanics that a physical system is 
stable if and only if its total potential energy is minimal. In 
many cases, it is also justified on the grounds that the 

surfaces tend to assume smooth and minimal energy 

configurations. Because there is no information available in 
the image contours about the material of the surface, the only 
reasonable assumption is that the surface material is isotropic 
and homogeneous. The computational theory of shape 
reconstruction from contours proposed in this paper is based 
on the variational principles in terms of general physical 
constraint assumptions : (1). the minimum energy principle; 

(2). isotropism and homogeneity of the material, i.e., the 

uniformity of the energy distribution. 

To deform a system with minimum total potential energy 
to a non-minimum energy system, external energy has to be 
converted into potential energy in the system. As is well 
known, the differential equations describing a system in non¬ 
minimum energy state are far more complicated than the 
equations describing the system in its minimum energy state. 
Non-uniformity of energy distribution represents information 
about the system. Thus we can draw a correspondence 
between energy, energy distribution and information. 
Therefore the interpretation of the image contours by this 
theory is a minimum information interpretation in some sense. 

Earlier studies such as [3,4,5,6,9,10,11,12,13,14,17] 
motivate and support the theory proposed in this paper. 
Barrow and Tenenbaum [11] optimized a smoothness measure 
to reconstruct planar curves and polyhedra. The optimization 
criterion used in [11] for continuous curves and straight lines 
are different, whereas a complete theory of shape 
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reconstruction from contours should be able to accommodate 
both cases. The optimization criterion developed in [1] by 
the authors, which is a preliminary version of the theory 
proposed in this paper, uses a single underlying mechanism for 
both continuous curves and straight line contours. Witkin 
[12] developed a maximum likelihood approach to shape 
reconstruction from contours, and achieved some success in 
interpreting irregular shaped objects. This method is 
ineffective when the contour has a regular shape and does not 
compute the right slant of an ellipse. Brady and Yuille [7] 
developed an extremum principle which maximizes the ratio of 
the area to the square of the perimeter. Their method would 
be ineffective to curved surface and images with both 
boundary and surface contours. The theoretical framework 
proposed by Poggio [8] lends strong support to the theory 
proposed in this paper. 

Kanade [9,17] developed a systematic method to recover 
3D shapes from a single view by mapping image geometric 
properties into shape constraints. He proposed the assumption 
of mapping 2D skewed symmetry into 3D symmetry, and 
proved that the skewed symmetry can be a projection of real 
symmetry if and only if its surface gradient is on a certain 
hyperbola in the gradient space. We have proved that 
Kanade’s assumption and hyperbola are natural results of the 
theory we proposed. Barnard [10] recently proposed a 
maximal orthogonal principle for 3D recovery based on 
psychophysical data. This principle is further developed and 
incorporated into our theory. 

Some work has been done which offers proof to the 
minimum energy principle approach of shape reconstruction. 
Grimson [5,6] and Terzopoulos [3,4] used a thin plate model 
and constructed the 3D surfaces from the scattered stereo 
depth data by minimizing the total potential energy of the 
thin plate. The work by Barrow and Tenenbaum [11] is based 
on a similar idea in interpreting line drawings by optimizing a 
’’smoothness” measure. 

The outline of the theory is summarized as follows. It 
is assumed that the contours and junctions have been classified 
as surface discontinuity boundaries and junctions, surface 
contours, and extremal boundaries. The classification itself is 
a very important problem, and there is still no complete 
solution to it Orthographical projection is assumed 
throughout 

First the reconstruction of a single surface from a simple 
closed 2D boundary contour is considered. Suppose that the 
2D boundary contour has continuous curvature K Q (t) and is 

given by r 0 (t) = ( x Q (t), y Q (t) ), t e T = [a,b], x Q (a) = x Q (b), 
y 0 < a ) = y Q (b), where t is a parameter invariant under 
magnification or contraction. Backprojecting r^(t) into 3D as 
r(t) = ( Xg(t), y^(t), z(t) ) such that r(t) has continuous 
curvature tc(t) and torsion t(0. Let 

( ic(t)~ T<0~ ) = ( K t (t), T t (t) ). (1) 

Define a vector 


i< t (t), t t <t)^Kj(t), t j(t)), where r Q (t) is convex. (2) 

p(t)=' 

Mtt“K t (t), tt — t t <t),ic^(t), t^( t))» otherwise. 

2 

Suppose the derivatives K’ t * ijeL . Let A = {r(t)|r(t) = 

2 

( x Q (t), y Q (t), z(t) ), ic(t), x(t) continuous, k’, TjeL }, B = 

{p(t) | r(t)e A). C°(£2) denotes the set of all continuous 
c 

functions with compact support Let H m be the Sobolev 

spaces, and be the completion of c’fo) in the norm 
u c 

| | • | | m Q* Note that for p(t)eB, K t (t), t^OeH 1 . 

Define an inner product as 

(p l’ p 2* “ r^T IC tl^ IC t2^ dt + 2*”^T T tl^ T t2® dt ^ 
k k 

+ 2~ Jj K ti^ K t2^ dt + T Ar T tl^ T t2^ dt 

where k , k are called energy factors of curvature and 
K T 

torsion, and k^, k^ are called uniformity factors of 

curvature and torsion. The integrals are Lebesgue’s integrals. 
When Pj = p^, the first two terms are measures of the 

potential energy in the reconstructed shape, and the last two 
terms are measures of the uniformity of the potential energy 
distribution. 

The assumption that c^t), is a reasonable 

smoothness assumption of the curve can also be justified from 
the property of embedding H m (n) into Suppose fl is a 

subset of R n , if m>j+n/2, then H m (fi) is embedded in c’te). 
For the surface case [3,4,5,6], the smoothness assumption 

ueH implies that ueC^. In the case of curves, K^.(t), 

tjWeH 1 implies that »c t (t), T t (t)eC°, i.e., the curve has 
continuous curvature and torsion. 

Then the reconstruction of the surface is formulated as 
the following variational problems. The 2D contour r Q (t) is 

first backprojected into 3D by minimizing 

IjtpU)) = ( p(t), p(t) ) (4) 

If the minimum is reached by r*(t)=( x Q (t), y 0 <t), z*(t) ), 
then a surface u(x,y) is interpolated by minimizing 

I 2 (U) = ^ 0 . { ! (Au)2 ' (1-o)(u xx u yy" U xy )}dxdy (5) 
with the inhomogeneous Dirichlet boundary condition 
u| aj2 = gfx.yl = z* (6) 


Graphics Interface ’86 Vision interface ’86 



315 


Theorem 1: The energy measure I^p) is 

invariant under linear transformation of the curve, 
i.e., if r 2 (t) = c r^t) + d, then = I^p ). 


Theorem 3: There exists an unique solution 
2 

vgU, U is a subspace of H , which minimizes ^(u) 

with u l as2 = e ' 


This theorem is important in the reconstruction because 
all the similar shapes must have the same energy measure in 
order for a certain shape to reach the minimum regardless of 
its size. Consider the energy in an ellipse and a circle and 
suppose both of them are planar. If the energy measure is 
defined as 

k K 2 

E(p(t)) = fj L k\s) ds (7) 

k 

The energy in a circle is -- the energy in an ellipse 
k 

ic 

is — f(a, b), where a and b are the lengths of the two axes 

of the ellipse. So an ellipse with larger a and b will have 
less energy than a circle with a smaller R. This is the reason 


why by minimizing E, an ellipse cannot be interpreted as a 
circle [7]. By minimizing I^p), an ellipse will be interpreted 

as a circle [1]. In implementation, I^Cp) is easily discretized 
as 


IjtpM) 


N k 
i=l i=l 


N 

l 


( 8 ) 


k , N 

+ r K( Vi] 


k N 

9 t1 9 

Q i> 


where Q. is the external angle between the two 
successive sides of the approximating polygon, and P. is the 

angle between the normals of the two planes determined by 
three successive sides of the approximating polygon. Where 
[i+1] = (i+l)mod(N). 


Theorem 2: There exists an unique minimum 
value of I (p), for all peB. Suppose the minimum 
* ♦ 

is reached by curve r (t), then r (t) has continuous 
curvature and torsion. 


The problem of minimizing I^(u) with an inhomogeneous 

Dirichlet boundary condition can be reformulated as follows. 

2 2 

Suppose g is smooth enough, let geH , veH Q , then any u = 
2 

v+geH is in the admissible space. The problem becomes 
2 

finding a veH^ minimizing 

I 2 (u) = \ a(u, u) - f(u) (9) 

= | a(v, v) - f(v) + a(v, g) + | a(g, g) - f(g) 
with homogeneous boundary condition v| = v | = 

0 06 n o 06 

0, which is equivalent tou| =g, u | = g L n - Where 

oS2 n Oob n o 

a(u, v) is the energy inner product and a(u, u)=I 2 (u). 


Next, the general cases are considered. Given boundary 
and surface contours with piecewise continuous K^(t). Let the 

external jump angles of ic(t) be a., i=l,2.n, the jump angles 

of t(0 at surface discontinuity junctions be i=l,2.m. 

Then the reconstruction is formulated as backprojecting the 
contours into 3D by minimizing 

k k 

j^pd)) = Z Ij( p.(t) ) + 2" Z ( 2 " e i )2 + 2~ 1 a ] 2 (10) 
i i i 

where k is the orthogonal link factor [1], k is the 
p a 

energy factor of curvature jump angles. The term of the 

jump angles in torsion (torsion jumps across surface 

discontinuity boundaries) Z (j - &.) 2 , is part of the orthogonal 
i 

links between surfaces based on the principle of maximal 

orthogonality between surfaces [1, 10]. Suppose z*(t) is the 
boundary and C(x., y.) are the points on the surface contours 

determined by minimizing J^ptt)). Then the surface is 
interpolated by minimizing 

J 2 (u) = SS Q {| (Au) 2 - (l-o)(u xx u yy - u 2 y )}dxdy (11) 


Y i 2 

+ Z [~ (u(x., y.) - c(x., y .))] 1 

u l = g(x, y) = z* 

The second term is interpreted as a set of vertical pins 
scattered inside the surface is only constrained by attaching 
ideal springs between those pin tips and the surface (see [3]). 
Where is the spring constant. 

Again we have the similar results: 

Theorem 1*: The energy measure J^p) is invariant under 
linear transformation of the curve. 

Theorem 2’: There exists an unique minimum value of 
J^p), for all peB. Suppose the minimum is reached by r*(t), 

then r*(t) has piecewise continuous curvature and torsion. 

Theorem 3’: There exists an unique solution veU, U is a 
2 

subspace of H , which minimizes JJu) with u| = g. 

Z oib 

Now we consider 3D shapes with more than one surface. 
Barnard [10] recently proposed a maximal orthogonality 
principle for 3D shape recovery based on psychophysical 
studies. This principle is modelled by putting ideal springs -- 
orthogonal links — at the corners of the surface discontinuity 

boundaries. Note that in ^ the term E (| - is an 
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orthogonal link term. So becomes 
k 

J 3 = J 1 + S 2 I( f " Y i (12) 

i 

where k is the orthogonal link factor as in [1]. 

T 

The finite element method is naturally suited to the 
problem of surface reconstruction from backprojected 3D 
contours because of the flexibility in the geometry of the 
method. Domains of complex shapes, boundary conditions, and 
nonuniform discretizations of the domain, all of which are 
features of the backprojected contours, can be easily handled 
in the finite element method. 


Another possibility of dealing with the inhomogeneous 
Dirichlet boundary condition is by the penalty method. The 
boundary contours are treated same as surface contours, ideal 
springs are attached between the contours and the surface at a 
set of discrete points. Then this becomes a "free boundary” 
problem, and the solution is only unique upto a linear term, 
ax + by + c. To have an unique solution, there have to exist 
three noncollinear points on the contours to uniquely 
determine the linear term [3, 22]. This will always be 
satisfied in practice. When the spring constraints are strong 
enough, the solution would be close to the boundary. 


For extremal boundaries, the normal to the boundary 
contours on the x-y plane is the normal to the surface. This 
can be handled by adding a penalty term to I^, J^. 


-- 2 ( 

2 r ivu 


Vu(x y) 

-| ao “ n(x r V ) 


(13) 


where n(x., y.) are the unit normals to the extremal 
boundary contour on the x-y plane at points (x., y.). The 

constraints are only at discrete points because of the 
consideration of implementation by the finite element method. 


If the boundary consists of partly extremal boundary, 
partly surface discontinuity boundary, we will have a mixed 
boundary value problem. It can be treated accordingly. 

Using the theory developed, we have proved that an 
ellipse will be interpreted as a circle, and skewed symmetric 
figures will be interpreted as real symmetry in 3D. Also 
several polyhedra and nonplanar polygon shapes have been 
successfully reconstructed. 

Curved 3D shapes have also been successfully 
reconstructed from 2D contour images. The inputs to the 
programs are a set of 2D data points obtained by digitizing 
the contour drawings by a digitizer, I^p) or J^p) is 

minimized by the Levenberg-Marquardt algorithm to 
reconstruct the 3D shapes. 
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Abstract 

An algorithm is developed to segment arbitrary boundary 
images into sets of boundaries which represent a single object, and 
to group together lines which correspond to a single object or 
object part. The algorithm is based on features which were found 
to used by humans in the early stages of visual processing, and 
which have a high correlation with perceptually significant 
aspects of images. In addition, the data structure used is based on 
the image representation used in the primate visual cortex. 

By using perceptually valid features, the algorithm is able to 
enhance the perceptually significant edges in an image using sim¬ 
ple, local, parallel computations. It demonstrates that selective 
processing can occur in the parallel stages of early visual process¬ 
ing, without domain specific knowledge, iterative processing, or 
top-down control of some mechanism to shift attention. 

KEYWORDS: image segmentation, boundary images, visual 
psychophysics 

*This project was supported by grant 1ST 8409827 from NSF. 

1. Introduction 

Images contain too much information for humans or 
machines to process all of it in detail. The human visual system 
solves this problem by performing a initial, cursory analysis of 
the entire image w'hich allows it to pick out automatically what 
is important in the image (Treisman, 1986), and then to selec¬ 
tively process that information in preference to the rest of the 
information. That is, a rapid, parallel analvsis of the entire image 
indicates which regions are likely to contain the most useful 
information. Then the following stages of analysis, which 
require more focused, serial processing can concentrate primarily 
on the preselected regions. This can also be a useful approach for 
a computer vision system, and in fact the parallel computation of 
intrinsic images can be viewed as an example of the first stage 
(Barrow and Tenenbaum, 1981). This paper describes another 
type of processing which enables the early visual processing to 
indicate which regions of an image are likely to contain the most 
useful information, and to selectively process such regions in 
parallel. This is accomplished by selectively processing image 
features which have a high correlation with perceptually 
significant aspects of an image. This is not a new approach. For 
example, edge detection techniques pick out object boundaries and 
other edges which are more perceptually significant than more 
uniform image regions (Marr, 1982). However, the research in 
this paper presents a new set of features which can enable strong 
inferences about which regions of an image contain the mast per¬ 
ceptually relevant information. 

Determining which aspects or features of an image contain 
the most useful information, and should therefore be preferen¬ 
tially processed is difficult because there are nearly an infinite 
number of potential features. The dimensions of physics cannot 


necessarily be used to determine which features are relevant, as 
perceptual features may lie along some other dimensions. For 
example, the perceived color of a region depends not only on the 
wavelength and intensity of the light reflected from it, but also 
on the relative contrast between it and neighboring regions. So 
how can perceptually relevant features be found? 

The question is further complicated as one set of features 
may be ideal for one task, but useless for another. One basic 
machine vision task is to segment an image into different regions 
which correspond to different objects, or object parts. This may be 
possible based on the color, shading, texture and shape informa¬ 
tion. But, are the color and shading information necessary for 
image segmentation? Not always, as humans can readily segment 
simple line drawings or boundary images'which lack that infor¬ 
mation. So one way to study image segmentation is to study line 
drawing perception, and as line drawings are much simpler than 
natural images, this should make the selection of features easier. 
Once features are found from line drawings, then it is possible to 
test them in the analysis of natural images. 

But even for simple line drawings, it is not obvious which 
Teatures should be used. As the goal is to find perceptually 
significant aspects of an image, and then to determine which 
features correlate with those aspects, it is desirable to determine 
what aspects of an image have perceptual significance for humans. 
It is not possible to just introspect about possible features, as the 
relevant preattentive stages of human visual processing are not 
available for conscious introspection (Julesz & Schumer, 1981). 
I he approach taken in this paper is to use psychophysical ex peri 
ments to explore preattentive vision and to discover image 
features used by humans. Once potential features are found, their 
usefulness is tested by developing a computational algorithm 
based on them, and then testing the algorithm. The algorithm 
developed here can segment arbitrary boundary images containing 
both straight lines and curves. It is a simple, data-driven, 
bottom-up approach, which requires no domain specific 
knowledge, and demonstrates the importance of using perceptu¬ 
ally valid features. 

2. Psychophysical Experiments 

The psychophysical experiments are based on the perceived 
contrast of lines phenomenon (Walters and Weistein, 1982a). The 
patterns in Fig. 1 can be used to illustrate this phenomenon. 
When viewed at low contrast the lines in the cube (Fig. la) 
appear to have higher contrast than the lines in Fig. lb. If these 
differences in perceived contrast can be correlated with the pres¬ 
ence of particular image features, it would suggest that stimuli 
with those features are processed differently from stimuli lacking 
the features. In particular, stimuli having features associated 
with high perceived contrast may be preferentially processed. 
The aim of the psychophysical experiments was to isolate such 
features. The experiments have been reported elsewhere (Walters 
and Weistein, 1982b; Walters, 1984, 1985), so only a brief descrip¬ 
tion is included here. 
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a. b. 


Figure 1 

By looking at lots of pairs of patterns designed to differ in 
terms of various global and local properties, it was found that the 
difference in perceived contrast did not correlate with any of the 
global features. For example, closure, global connectivity, per¬ 
ceived 3-dimensionality and objectness did not correlate with per¬ 
ceived contrast. Local features were also explored, and the pres¬ 
ence of angles, and the number of free line ends were ruled out. 
The only two features that did correlate with perceived contrast 
were line length, and the local connections between the ends of 
the line segments. For lines which subtended less than one degree 
of visual arc, perceived contrast was a positive function of line 
length. For longer lines, there was no correlation between per¬ 
ceived contrast and length. The other local property is the way in 
w hich line ends are connected, and experiments show that there is 
actually a hierarchy of fend connections. Figure 2 shows the 
results of one such experiment. The brightness of various patterns 
f ormed of 30 minute line segments was measured relative to a 
line which subtended 60 minute of visual arc. Some of these pat¬ 
terns could be referred to as the "L", "Fork", and "T" junctions 
from the Huffman-Clowes tradition (Huffman, 1971; Clowes, 
1971; Waltz, 1975). But it turns out that that is not the most use¬ 
ful classification. As section 3.2 explains, it is better to classify 
these patterns in terms of the spatial relations between the ends of 
the lines. 

From the results in Fig. 2 we can see that line segments 
joined at their ends have higher contrast than segments where one 
end abuts the middle of the other segment. And these abutting 
lines have higher contrast than lines that intersect, while inter¬ 
secting lines have higher contrast than unconnected lines. 

Further experiments found one additional pattern in the 
hierarchy, as shown in Fig. 3. Two lines connected end-to-end 
(pattern A) have higher contrast than three lines connected end- 
to-end (pattern B), which have higher contrast than lines which 
connect end-to-middle (pattern C), which have higher contrast 
than the lines which intersect, which in turn have higher contrast 
than parallel lines. 



Pattern 

Figure 2 


3. Computer Model of Contrast Enhancement 

The r*\<-^physical experiments provide evidence that the 
length •«* iine> . and the connections between the ends of lines, are 
b,tsk features for human vision. This hypothesis w 7 as further 
tested by implementing it as a computer model. The model 
receives a boundary image as its input, and outputs the "per¬ 
ceived" contrast of the pattern. . 
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3.1. Enhancement Rules 

The model uses the presence of the different types of end 
connections to implement the following enhancement rules, l) 
Each section of line with length L is enhanced by amount "1". 2) 
Lines terminating at Type A connections are enhanced by amount 
"a". 3) Lines terminating at Type B connections are enhanced by 
amount "b". 4) Lines terminating at Type C connections are 
enhanced by amount "c". 5) Amount "a" > amount "b" > 
amount "c". 

3.2. Detection of Features 

The presence of the features can be detected in an image by 
defining the different end connections in terms of a discrete 
geometry (Rosenfeld, 1979). The first step is to determine 
whether each point in the image is part of a line. Thus points can 
be labeled as either non-line or line points. The line points can 
then be further broken down into end points and non-end (mid¬ 
dle) points. But consider the intersection of two lines. The point 
that lies on the intersection can be considered to be a middle point 
of one or the other line, or can even be considered an end point of 
each of four shorter line segments. Thus some way of defining 
the point is needed which avoids these ambiguities. An edge 
detection technique could be used to label each point in the image 
with the orientation and amplitude of the best line or edge cen¬ 
tered on that point. But at the intersection point, it is not so clear 
what the best line would be. Some edge detection techniques give 
the orientation of either one or the other line, while others give 
an average of the two orientations. So, just at a point that pro¬ 
vides lots of information about the scene, the edge detection 
methods don’t give sensible answers. 

Another problem in edge detection arises because many of 
the popular approaches to edge detection in computer vision are 
based on the use of the mathematics of continuous functions. This 
creates problems in detecting the Type A connection, which is 
defined in terms of a tangent discontinuity, as in the mathematics 
of continuous functions, discontinuities are problematical. Poggio 
et al. (1985) have suggested that the solution to this problem is to 
regularize the computation. For example, get rid of the discon¬ 
tinuities by convolving the image with a gaussian, and then look 
for edges in the blurred image. The advantage of this technique is 
that patterns can then be represented as smooth continuous func¬ 
tions, but it is rather unfortunate from the contrast enhancement 
point of view, as it gets rid of the the tangent discontinuities, 
which appear to be such important features for early vision. So 
edge representation methods based on continuous functions are not 
very useful for this model. 

The solution to these edge representation problems can be 
found by looking at how edges are represented in the primate 
visual cortex. If an amplitude/orientation scheme were used, 
there would need to be two "edge" neurones in the primary visual 
cortex for each retinal ganglion cell: one to signal the amplitude 
of the edge at that point, and another to signal the orientation of 
the edge. But the cortex does not have that organization, instead 
there is a whole column of edge neurones for each spatial location, 
and each neurone is sensitive to edges with a narrow range of 
orientations (Hubei and Weisel, 1968). So, instead of just signally 
the "best" amplitude and orientation, the orientation column sig¬ 
nals the amplitude at many orientations. This representation has 
several advantages over the amplitude/orientation scheme. For 
example, at the intersection of txvo lines, both orientations can be 
represented, which would disambiguate the pattern. 
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l iyure A 


The data structure used in the general enhancement algo¬ 
rithm, is based on the orientation representation of the mam¬ 
malian visual system. Figure 4(a) shows the basic form of the 
data structure. It is an orientation plane representation. It is a 3- 
d space where each point represents a short piece of line having a 
specific orientation and located at a specific x-y location. An image 
is transformed into this representation by convolving the image 
with a separate oriented edge kernel for each orientation plane. It 
is possible to construct any number of separate orientation planes 
for this representation. In the current implementation 8 or 16 
orientation planes are used. In addition, if boundaries are present 
at different scales in the image, then a separate orientation plane 
structure is needed for each scale. This would be required for 
grey-level images, though not for line drawings containing a sin¬ 
gle width of line. 

One advantage of the orientation plane representation is that 
it makes it easy to define lines and find tangent discontinuities. A 
line is defined to be a set of dark pixels such that each pixel is 
connected to neighboring pixels of similar orientation. The specific 
definition of ’connected* is orientation dependent. Line pixels can 
only be connected to other line pixels which lie within a certain 
x,y distance, and a certain orientation distance, and the greater the 
x,y distance, the greater the possible orientation distance which 
can yield a connection. These definitions can be used to label all 
the pixels in an image as either nonline, end ‘e* or middle ‘m’ 
points. 

Connections can be defined in terms of these ‘e* and ‘m’ 
labels. For example, for a Type A connection located at point 
(xl,yl), examining the xl,yl position in each plane would yield 
exactly two ‘e* labels, and no others. (Actually, the examination 
may involve a small neighborhood around the (xl,yl) point.) This 
suggests how to define the different connections in terms of the ‘e’ 
and *m* labels. 

3.3. Completeness of the Feature Set 

Another question that needs answering is, are these features 
geometrically complete? Does it cover the space of all possible 
connections? This question can be answered in terms of all the 
possible combinations of ‘e* and ‘m* labels. Figure 5 shows all the 
possible connections for straight lines of just three possible orien¬ 
tations, in terms of the number of ‘e’ and ‘m* labels at the center 
of the connections. The upper left three are not connections. The 
upper right two are both intersections, which receive no contrast 
enhancement. Three of the connections were used in the psycho¬ 
physical experiments, and there are two additional connections 
needed to cover the space. The new connections are hypothesized 
to belong to the classes as labeled. 

There are only a limited number of orientations in Fig. 5. 
For more orientations, the set would be extended, and the labels 
can also be extended. Everything out to infinity in the top row is 
a Type D connection. Everything out to infinity in the second 
row is a Type C connection. And everything out to infinity in 
the other rows is a Type B connection. It is important to be able 
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Figure 5 


to label all possible types of end connections, but at the same time, 
the probability of any of the higher order types occurring in a 
natural scene are very small, thus they are not as important as 
the few seen in Fig. 5. It can now be seen that the set of connec¬ 
tions is complete, and that there is a means of detecting the pres¬ 
ence of the features as all connections can be classified into the 
four perceptually valid classes using these rules, l) All 
connections with exactly two ‘e* labels are of type A. 2) All con 
nections with two ‘e* labels, and at least one additional ‘e’ or ‘m’ 
label are of type B. 3) All connections with exactly one ‘e’ label 
and one or more ‘m’ labels are of type C. 4) All connections with 
no *e* labels and two or more ‘m’ labels are of type D. 

3.4. Dealing with Curved Boundaries 

The examples thus far have dealt only with line drawings 
containing straight lines. To be useful the algorithm should be 
able to deal with curves as well. Does the perceived contrast of 
curves also vary with the type of end connections present? 
Further psychophysical experiments confirmed this hypothesis. 
Figure 6 shows the hierarchy of end connections for curved lines, 
with the lines on the right having the highest perceived contrast, 
and the lines on the left having the lowest. The results for the 
curved lines are identical to the results for the straight line seg¬ 
ments, although curved lines have a possibility of one further 
type of end connection, as shown in the A* pattern. With curved 
segments, two segments can merge or join into one, without a 
discontinuity of the tangent. So, one additional rule must added 
to the algorithm to deal with such connections. 

It is easy to see the relation between these curved connec¬ 
tions and the straight line connections, but what is the relation 
between a Type A connection, and the same pattern with the 
discontinuity slightly smoothed, as in Fig. 4? The principle of sta¬ 
bility argues for continuity of interpretation when small pertur¬ 
bations are made in an image. Thus when the right angle of is 
perturbed to form the smoothed angle, the interpretation should 
remain similar. In order to have the same orientation plane 
interpretation a means of defining the end connections of curved 
lines is needed. The solution is that end connections occur when a 
line passes through J orientation planes within K pixels; J and K 
are variables which determine the sensitivity to curves. 
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3.5. Model Implementation 

The model can be implemented in parallel. Imagine a simple 
processor at each pixel in an image, such that each processor 
receives input only from a small neighborhood of pixels. Hach 
processor can compute whether a particular spatial relation 
between the ends of lines is centered in its neighborhood, and if 
so, can send the appropriate enhancement out along the appropri¬ 
ate pixels. Note that it’s not the mechanism of contrast enhance¬ 
ment that is being modeled - it’s the overall computation that is of 
concern. 

4. Model Results 

Figure 7a shows the output of the computer model for four 
of the psychophysical patterns. The results are displayed in terms 
of a threshold. The-highest threshold - that is the highest contrast 
lines are at the top. The threshold becomes lower in each subse¬ 
quent line. At the bottom is the lowest threshold where all of 
the lines which were present in the patterns appear. 

The model results agree with the experimental results for 
all of the patterns used in the psychophysical experiments. Thus 
the hypothesis that perceived contrast is a function of line length 
and the type of connections between the ends of lines, is further 
supported. 

4.1. Uses of Features Suggested by the Model 

A further use of the computer model is to go beyond the 
psychophysical results. One limitation with the human experi¬ 
ments is that subjects are only able to make global judgements of 
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contrast - they can say pattern A was overall brighter than pat 
tern B. but they cannot say whether a particular line in pattern A 
\\u^ brighter than the others. But the computer model gives such 
results, which can help in determining why such processing 
might be useful. 

Actually, some of the possible uses can be seen with these 
simple stimuli of Fig 7a. Lines which are part of object contours 
are enhanced relative to lines which ,form texture, or are per¬ 
ceived as noise. And, the outer contours of objects are enhanced 
relative to the inner contours. 

Looking at another example (Fig. 7b) shows how the model 
goes one step further. Figure 7b contains two distinct objects, one 
partially occluding the other. At the highest threshold the lines 
composing the two objects are not spatially continuous. The the 
model selectively enhances objects in the foreground, and helps to 
group lines into two sets which correspond to the two objects. 

Figure 7c shows the model results for an impossible object. 
For the possible object, the model results at an early level (level 2) 
show the main properties of the object - a blob with a hole in it. 
That is, the model is giving the topological structure of the object 
at a very early level. But with the impossible object, at the first 
level it is represented as a single object, then as one object with 
two sub-objects, or object components. The model again agrees 
with our perception of one object if we look at one corner of the 
drawing, and another if we look at the diagonally opposite corner, 
and neither we nor the model can get the perceptions to merge. 

5. Uses of Image Features 

The results of the computer model give more support to the 
hypothesis that the length of lines, and the spatial relations 
between the ends of lines, are perceptually valid features. But 
how should these features be USED? 

5.1. Current Approaches 

Various theories concerning the use of features have been 
proposed in computer vision. One conceptually simple use of 
features is to represent objects in terms of a list of features (Feld¬ 
man, 1985). A model of an object can be expressed in terms of 
features and the relations between them, and then portions of an 
image can be compared to the model to see if the object is present. 
This is similar to the way line drawing junctions were first used 
by Roberts (1965). But this use involves domain specific 
knowledge, which is a major drawback as it is thus not easily 
extendible to deal with arbitrary images. 

Guzman(1979), Kanade (1981), Draper (1981), and Lee et 
al.(l985) have used very similar line drawing features in their 
boundary image interpretation algorithms. The features are used 
in various constraint satisfaction systems. This paper presents 
another, related use of end connection features, which is not lim¬ 
ited to trihedral vertices, and accomplishes a somewhat different 
task. 


5.2. Selective Enhancement 

A different use of features is suggested by the psychophysi¬ 
cal experiments and computer model. Lines appear to be selec¬ 
tively enhanced based on the presense of a few basic features. 
(The potential usefulness of this enhancement is described in the 
next section.) It appears that selective enhancement is possible, 
e\en in the automatic parallel stages of processing. This requires 
no top-down processing, no domain-specific knowledge, and no 
iterative processing. 
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6. Perceptual Significance of Selective Enhancement 

\Khv should such parallel selective enhancement be useful? 
I he computer model provided some hints. The outer contours of 
objects were enhanced more than the inner contours, and object 
contours were enhanced more than lines interpreted as texture or 
noise, and the highest contrast lines were correctly segmented. 
But why would these results be helpful? 

The selective enhancement of outer contours is important for 
object recognition. An object can usually be recognized just from 
it’s silhouette, which is simply it’s outer boundary - outer con¬ 
tours have a special perceptual significance. And the edges of a 
silhouette can only contain type A connections. This may be the 
reason that end-to-end connections appear to receive the most con¬ 
trast enhancement. And this supposed correlation between type A 
connections and the outer contours of objects makes it possible to 
infer that the most enhanced lines in an image have a high proba¬ 
bility of having arisen from the occluding contours of objects. 

The type B connections can arise from either an inner or 
outer contour of an object, and thus do not have as strong a corre¬ 
lation with outer object boundaries. Even when we divide the 
type B connections into forks and arrows as Chakraverty and oth¬ 
ers do (Chakraverty, 1979; Lee et al., 1985), they can still both 
arise from both types of object contours. 

Yet, if two simple assumptions are made, both type A and 
type B junctions have a high correlation with object contours in 
general. 

The first assumption is: 

Assumption 1: Viewing position is representative. 

(This means we assume we are looking at an object along a view¬ 
ing direction which is not one of the few viewing directions 
which results in the accidental alignment of object boundaries or 
wires in a scene. (Binford, 1981; Cowie, 1982)) 

Result 1: Two or more lines meeting at a junction should be 
interpreted as two or more wires or object boundaries that meet. 
Assumption 2: Object position is representative. 


(This means we assume objects or wires in a scene are not acciden¬ 
tally aligned. The first assumption concerns looking at objects in 
such a way as to make them appear to be accidentally aligned. 
The second assumption concerns cases where the objects are in 
some form of accidental alignment with each other, independent 
of the viewing position.) 

Result 2a: Line ends meeting at a point should all be interpreted as 
having arisen from the same object. 

Result 2b: Two or more line middles falling on a point should be 
interpreted as wires or texture boundaries. 

Result 2c: Connections containing both ends and middles are most 
generally interpreted as object boundaries that either occlude or 
meet other object boundaries. 

Result 2d: The end line in a connection should be interpreted as 
arising from a different object from the middle lines. 

The assumptions about nonaccidental alignment do not mean 
that images with accidental alignment cannot be enhanced or seg¬ 
mented using this algorithm. It just means that the most general 
interpretation for a basic feature will be utilized. Thus in the 
majority of cases, the correct interpretation will arise, while a 
few cases may exist where the algorithm gives an incorrect 
interpretation. 

From these assumptions we see that in Type A and B con¬ 
nections the lines have a high probability of having arisen from 
the same object. This result makes these connections the most use¬ 
ful for grouping together lines which correspond to a single object 
or object part. This result is also useful for segmenting the image 
into sets of lines which represent a single object. Note that this 
type of segmentation is different from Richards’ segmentation of 
curves (Richards & Hoffman, 1983). His work shows how to seg¬ 
ment the curves of an object into sets which correspond to various 
object parts, but not how to segment an entire image into different 
objects. 
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So, from the contrast enhanced lines, certain inferences about 
the line drawing can be made. Why would these inferences be 
useful for a visual system? Well, as previously 
mentioned, a major problem for a visual system is that there is 
too much information in a visual image to process all of it in 
detail. One solution is to have some automatic preprocessing sys¬ 
tem which determines which lines or areas contain the most 
important information, and then to concentrate the serial process¬ 
ing on those areas, while ignoring other potentially less fruitful 
areas. This model automatically enhances those lines which have 
a high probability of being part of object contours, rather than 
just part of texture or noise. If the next stage of processing has to 
be selective, it can "attend" only to the enhanced lines and thus 
not waste resources processing spurious edges. But note that some 
stages of the selective processing can be done in parallel, and they 
do not require the top-down control of some mechanism to shift 
attention (Ullman, 1986). 

7. Additional Perceptual Effects 

Before describing the results of implementing the enhance¬ 
ment algorithm for curves and straight lines, a couple of other 
perceptual effects incorporated into the algorithm need to be 
described. The connections between the ends of straight and 
curved lines are basic features for the human visual system. The 
human visual system is also sensitive to virtual edges such as the 
ones seen in Mg. 8a. Thus the enhancement algorithm may be 
improved by including the ability to deal with virtual edges. 
Again the orientation plane representation makes it easy to 
represent virtual edges, by allowing edges to grow perpendicu¬ 
larly from the ends of lines. This is similar to the boundary con¬ 
tour completion of (Grossberg and Mingolla, 1986). 
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Figure 9 demonstrates one further trick of the human visual 
system, which is incorporated into the algorithm. The figure is 
perceived as representing a single diagonal line, which appears to 
pass behind the surface represented by the vertical lines. Thus 
pairs of Type C connections, when aligned and having a particu¬ 
lar symmetry can be interpreted as occlusion of a single line. A 
related approach ahs been used by (Lee et al., 1985) to find hidden 
vertices in line drawings. 

8. Segmentation Examples 

The contrast enhancement algorithm can be implemented 
using the orientation plane representation. Figure 10a shows some 
examples of applying the segmentation algorithm to 2-D , origami, 
and 3-D objects. The drawings are correctly segmented in all 
three cases, as indicated by the different line styles for the 
different objects. Again the outer contours of the objects are 
enhanced more than t he inner contours, and objects in the 
foreground are enhanced relamr to aJuded objects. Note that 
the one set of rule^ can deal with the three separate domains. 

Another example is seen in 1 if. 10b, where the first and 
second images differ only by a single line segment, yet the second 
alone is represented as two separate objects. Note that this segmen¬ 
tation is indicated early in the process - ie. at Level 2. Thus even 
at this early stage the segmentation is correct. Note that the seg¬ 
mentation is performed without using either implicit or explicit 
models of objects, and the top-down processing that model match¬ 
ing requires. This is different from most current algorithms, and 
enables any boundary image to be processed. 

9. Grouping Performance on an Arbitrary Line Drawing 

To demonstrate the ability of the enhancement and grouping 
algorithm, to deal with an arbitrary line drawing, a cartoon from 
the New Yorker was processed in accord with the algorithm. Fig¬ 
ure 11a shows the original cartoon. In Fig. lib, only the most 
enhanced groups of lines are displayed - those involved in Type A 
and A* connections. Only 61 of the total of 86 lines are present, 
yet object recognition is possible. If just the remaining 25 lines 
are displayed, object perception is not possible - which is weak 
evidence that the algorithm picks out the most perceptually 
salient lines. 

The grouping at this stage is depicted by the different line 
styles. Sixteen of the twenty-three separate objects or object parts 
.are represented at this stage. (Due to reproduction limitations, 
only four line styles are used in the figure, however each instance 
of line style indicates a separate set of lines.) Again the algorithm 
is effective in reducing the complexity of the drawing in terms of 
the number of lines, without diminishing the grouping capabili¬ 
ties. 

Figure 11c shows the final grouping of the cartoon. The sets 
all correspond to object or object parts that are readily named by 
humans: ie ’crown’, ’robe’, ’cuff’, ’sleeve’, ’foot’, etc. There are no 
groups which would have to be described as "the upper right 
hand portion of object x", which again suggests that the grouping 
has perceptual significance. The algorithm could be used as a 
powerful preprocessor for a scene analysis system, as it accom¬ 
plishes a lot, given just a handful of simple rules. Later stages of 
analysis could use the enhanced sets of lines as input to an object 
recognition algorithm (Pentland, 1985; Biederman, 1985). 

10. Texture Boundaries 

Up to this point the spatial relations between the ends of 
curves and lines have been discussed, which are one type of boun¬ 
dary found in images. But the algorithm could equally well be 
applied to other boundaries - for example, texture boundaries. 
Instead of using lines or edges as the input to the algorithm, the 
boundaries defined by differences in texture could be contrast 
enhanced in accordance with the spatial relations between their 
ends. This is an interesting example, because it may be that the 
end-connections are used twice in this type of analysis. Julesz has 
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many impressive experiments designed to uncover the features 
used by humans in texture segregation, which he calls textons 
(Julesz & Bergen, 1983). At last count, the texture features 
include color, elongated blobs (which includes lines), their free 
terminators, and crossings. Julesz’ rejection of global properties 
such as closure, and global connectivity for preattentive texture 
discrimination agrees with my findings that these properties are 
not relevant to changes in perceived contrast. And as there is a 
strong correlation between the number of free terminators, and 
the type of end-connections, it may be that the hierarchy oi end 
connections found to alter perceived contrast, can equally well 
explain texture segregation of patterns composed ol lines and 
. ur\ es. 

11. Natural Images 

A technique for finding texture boundaries in natural images 
is necessary before the selective enhancement algorithm can deal 
with natural images which contain regions defined by texture 
edges rather than intensity edges. But can the selective enhance¬ 
ment algorithm work for natural images which do not contain 
texture boundaries? We are currently addressing this question by 
implementing the algorithm for natural images. 

There are two basic problems when applying line drawing 
algorithms to natural images. First is the problem of extracting 
the edges from the image, leaving the connection information 
intact. We are developing edge detection techniques, similar to 
those of Canny (1984), to alleviate this problem. A related prob¬ 
lem is that many spurious edges are found w ith many edge detec¬ 
tion techniques. The lack of contrast enhancement for short edges, 
and the selective enhancement of end connected edges both reduce 
such problems. 

The other problem is that extracted edges are often noisy and 
contain large gaps. But a similar problem is found in many car¬ 
toons - end connections or lines are implied but not explicitly 
present. The selection enhancement algorithm uses the creation of 
virtual edges to solve this type of problem, and it may be possible 
to extend this solution to natural images. 

12. Conclusion 

1‘he end connections of lines and curves appear to be basic 
features which allow bottom-up processing of boundary images 
using a single set of simple rules. The contrast enhancement algo¬ 
rithm suggests that certain selective processing can be performed 
in the parallel stages of preattentive processing. It is a data-driven 
approach which accomplishes tasks previously thought to require 
domain specific knowledge. Object models are obviously necessary 
for some stages of object recognition, but the contrast enhancement 
algorithm demonstrates that some steps which were previously 
thought to require model matching, do not. This shows how pro¬ 
ductive bottom-up processing can be xvhen psychophysically valid 
features are used in perceptually valid wavs. 
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Abstract 

The apparent spatial frequency and orientation organiza¬ 
tion of the primary visual cortex is used as a basis for 
texture description. A Frequency, Orientation, neural 
firing Rate, and spatial Phase (FORP) representation is 
proposed for the analysis of natural textures and the 
synthesis of test textures. Higher order texture analysis 
such as discrimination and segmentation in this FORP 
space is discussed. 

Keywords: texture analysis, vision models, primary 
visual cortex models 

Introduction 

Many researchers have tried to duplicate the ability of 
the human visual system to segment natural scenes 
based on the different texturing of surfaces. We choose 
to define texture as that property of surfaces that can be 
described by the local pattern of spatial variation of 
intensity. We also take as different textures those that 
can be differentiated by human perception. In this light, 
we base our analysis of texture on a model of the early 
human visual system. In particular, we choose a model 
of the primary visual cortex since both orientation and 
spatial frequency information exist here. 1 

We use this information in an orthogonal feature 
extraction space to generate simple test textures and 
analyse natural textures. 

Texture and the Visual Cortex 

Given that the primary visual cortex is well suited to 
texture computations, is there any evidence that it actu¬ 
ally performs them? The following evidence indicates 
that this could indeed be true. 

First of all, Julesz 2 mentions the high speed 
discrimination ability of human subjects. This implies 
that texture discrimination is a low level visual function 
and must be early in the visual chain. Kimchi and 
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Palmer, 3 through the use of perceptual experiments 
found that texture is processed separately in the visual 
system from shape and structure. Lastly, Berlucchi and 
Sprague 4 use lesion studies to deduce that shape and 
structure encoding does not exist in the primary visual 
cortex. They suggest the primary visual cortex could be 
used for texture analysis. 

The primary visual cortex appears well suited for 
texture analysis, it appears to actually perform texture 
computations, and it can use this information to improve 
image segmentation. 

A Visual Cortex Model 

Pollen and Ronner 1 describe a model of the primary 
visual cortex which outlines various functions of cortical 
neurons. These functions include the retinotopic spatial 
map, ocular dominance, orientation, spatial frequency 
and spatial phase. Hubei and Wiesel 5 first described a 
small separate processing region in the primary visual 
cortex as the hypercolumn. The hypercolumn was 
responsible for analysing a small area of the visual field 
for orientation information. The hypercolumn has 
become the name for the 0.5 mm wide cortical area that 
contains orientation and frequency selectivity neurons 
for a small visual region. There exist hypercolumn 
regions to cover all of the visual field. 

Our model of the primary visual cortex is based 
on this hypercolumn structure. One hypercolumn is 
modeled as a three dimensional space. The space con¬ 
sists of a spatial frequency axis, an orientation axis, and 
a spatial phase axis. Each point in this space 
corresponds to a neuron that is selective to a particular 
set of frequency, orientation, and phase. A magnitude 
component is also included in this space to account for 
the strength of response to this particular set. The mag¬ 
nitude is coded by the neuron as a neural firing rate. 
This model is a modified two dimensional Fourier space 
with all symmetric regions removed. The model has been 
named FORP (spatial Frequency, Orientation, neural 
firing Rate, and spatial Phase). 
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The FORP based tools that were developed 
allowed us to both synthesize and analyse textures. One 
of the tools allowed a window of a texture image to be 
chosen and be represented in FORP space. The other 
allowed the experimenter to place individual points or 
group of points in the space and then have the texture 
generated that corresponded to this pattern of firing 
neurons. Figures 1 thru 6 are outputs of these pro¬ 
grams. 

Figure 1 shows a representation of FORP space 
and the corresponding synthesized texture. It shows the 
three axes of frequency (range of 0 to F s /\ 2), orientation 
(range of 0° to 180°), and neural firing rate (normalized 
to a range of 0 to 1). There is a grid on the 
frequency/orientation plane, and the intersection of two 
grid lines corresponds to a neuron or a set of neurons. 
The intensity of the bar at a frequency/orientation point 
represents the phase. The lowest intensity corresponds 
to a phase of 0 and the highest to a phase of 2rr. If 
there were a set of phase sensitive neurons then the bar’s 
intensity would represent which phase neuron in this set 
was responding maximally. The image beside the graph 
contains a homogeneous texture generated from the 
FORP space data. The little square in the bottom left 
corner of the image delineates the local region that is 
represented by the FORP space. The remainder of the 
texture image is a mosaic-like repetition of this small 
subimage. 

To demonstrate the nature of FORP space we 
have performed a number of simple texture syntheses. 
In Figure I, a single point is placed at a frequency of 
F s /16 (that is 1 /16th of the spatial sampling frequency), 
and an orientation of 110° (note that 0° is vertical). 
The resulting synthesized texture is a spatial sinusoid at 
110° and has 2 periods in its 32 by 32 sub-window. This 
point is moved along the frequency and orientation axes 
(Figure 2). Note that the frequency change modifies the 
sinusoidal spacing and the orientation change modifies 
the sinusoid’s angle. 

To answer the question, "What will natural tex¬ 
tures look like in FORP space? ", we performed some 
texture analyses shown in Figures 3 thru 6. Images of 
the natural textures oriental rattan, 6 diatom, 7 and 
J-Cloth^ were each sampled with windows at different 
positions. The window size was 64 by 64 taken out of a 
256 by 256 image, except the diatom samples which were 
done with a 32 by 32 window. The most important 
aspect of these FORP representations is their similarity 
in shape for the same texture, and their dissimilarity for 
different textures. In Figures 3 and 4, both FORP space 
representations of oriental rattan look quite similar. The 
majority of neural response is along the 0 3 , 90 ° and 
180° lines. This is understandable when one realizes 
that oriental rattan is composed mostly of horizontal and 
vertical edges. In Figure 5 the FORP space of diatom 
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looks quite different from that of rattan. The neural 
responses are spread quite evenly in orientation. This is 
to be expected of a texture that is composed of circles, 
since a circle has edges of all orientations. Again, in Fig¬ 
ure 6 the FORP space of J-Cloth^ is quite different 
from previous ones. 

D’Astous 8 comments on this similarity of fre¬ 
quency domain representations and its importance: "... 
the power spectrum is fairly invariant to minor changes 
in structure caused by either low magnitude additive 
noise, or by small deviations in the periodicity of the tex¬ 
ture. This is of particular relevance to the problem of 
discriminating natural textures which tend to be noisy 
and, though many textures exhibit regularity to a certain 
extent, are not strictly periodic.” t 

Required Accuracy 

The accuracy of the FORP parameters was studied to 
see how it affected the representation of textures. A 
window of a texture was converted to FORP space at 
various frequency and orientation accuracies. This 
representation was then used to generate a texture 
which was compared to the original. 

The amount of orientation information required 
depended on the type of texture. A synthetic texture 
with only horizontal and vertical edges required only 2 
levels of orientation whereas the diatom texture required 
at the least 18 levels (each 10° wide). It was felt that 
20 or more orientation levels would be sufficient for most 
textures. 

The number of spatial frequencies was changed by 
modifying the size of the discrete frequency transform. 
In all cases the accuracy of the result was not affected 
but different amounts of the textures were captured. 
For a few spatial frequencies only a small piece of the 
texture would appear in the mosaic, and for many fre¬ 
quencies a large area of the original would appear. 

Does the above relate to the accuracy of the pri¬ 
mary visual cortex ? Hubei and Wiesel 9 found the accu¬ 
racy of the orientation neurons to be approximately 10°. 
But there is a major difference between the FORP 
model orientation sensitivities and those of the primary 
visual cortex. The orientation sensitivities of neurons 
overlap much like the spatial frequency sensitivities. It 
can be shown that this may improve the actual orienta¬ 
tion accuracy in the hypercolumn by quite a bit. In 
terms of frequency, nothing is lost as long as the local 
analysis region size changes with frequency accuracy. It 
has been shown that the receptive field sizes in the pri¬ 
mary visual cortex are indeed proportional to their fre¬ 
quency bandwidth. 1 


t D’Astous 1983, p. 55 
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A model of the hypercolumn in the primary visual 
cortex that has many of hypercolumn traits has been 
presented. The model seems to be useful for characteriz¬ 
ing texture but more texture samples need to be 
analysed. 

Application of the Visual Cortex Model 

Our visual system, if given the information contained in 
FORP space, could extract further information to 
characterize different textures. Neurons could be con¬ 
nected to these frequency and orientation selective neu¬ 
rons in such a way as to extract texture based features. 

Michael 10 has described the method of inter- 
neuron information transfer. The output of a nerve cell 
is transmitted along its axon and is then connected to 
the input of another nerve cell via a chemical junction 
called a synapse. The synapses can either inhibit the 
nerve cell or excite it. The receiving neuron performs a 
type of summation or integration of all its synaptic input 
which results in an overall excitation level. If this excita¬ 
tion is above a threshold then the neuron will fire and 
transmit its excitation to other connected neurons. If 
the total inhibitory input is larger than the total excita¬ 
tory input then the neuron will be inhibited from firing. 
Each synapse can also have an associated weight or a 
multiplication factor that emphasizes or de-emphasizes 
certain neural inputs. 

This structure of inter-neuron connection is an 
effective means of constructing pattern recognizers. 
When a certain pattern of excitations is present on the 
axons of the input neurons, the processing neuron can 
recognize it and fire proportional to the strength of that 
pattern. In the context of the FORP cortical model, 
higher level neurons will be able to recognize certain pat¬ 
terns of firings of the frequency and orientation neurons, 
and as much as these patterns correspond to texture, 
these neurons will recognize textures. 

Figure 7 depicts a simple realization of such a pat¬ 
tern recognizer. The hypercolumn is represented by a 
grid of small boxes. Each box corresponds to a neuron 
that is sensitive to a particular spatial frequency and 
orientation in one local area of the visual Field. These 
neurons output to feature detector neurons through inhi¬ 
bitor and excitor synapses. The feature neurons shown 
can recognize very simple patterns in the FORP space, 
and may also receive either inhibitory or excitatory 
input from neighbouring hypercolumns. In Figure 7 the 
Feature 1 neuron will detect a pattern of all one orienta¬ 
tion with a strong low frequency component, a weak 
second frequency component, and a strong third fre¬ 
quency component. The Feature 2 neuron will detect a 
pattern of one strong high frequency component and 
weak components on adjacent frequency and orientation 
neurons. These particular patterns might correspond to 
a particular texture or to a particular characteristic of 
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textures. 

Looking at Figures 3, 5, and 6 which are the 
FORP space representations of oriental rattan, diatom 
and J-Cloth- M; respectively, it is easy to see how the 
neural mechanism described above could pick out pat¬ 
terns characteristic of each of these textures. Many 
more neural connections and some careful choice of inhi¬ 
bitory and excitatory synapse weights would be needed. 
A feature detector neuron could be designed to detect 
each of these textures since they are so different in 
FORP space. The implication is not that the primary 
visual cortex actually has a neuron for every type of tex¬ 
ture, instead the suggestion is that a computer or electri¬ 
cal realization of such a mechanism could have separate 
texture recognizers if that was the end goal. In the visual 
cortex, the First level of feature extraction neurons would 
deal with texture attributes instead of specific textures. 
Second level neurons could combine these texture attri¬ 
butes to further specify texture. This hierarchical organ¬ 
ization would be more general and more flexible than 
having textures recognized at the First levels. 

One potential problem with this feature extraction 
mechanism is its sensitivity to orientation. A feature 
neuron that fires for a texture at one orientation may 
not fire when presented the same texture at a different 
orientation. People have little difFiculty identifying the 
oriental rattan texture regardless of its rotation. 
Perhaps ours features should be orientation invariant. 

Jolicoeur 11 demonstrates through perceptual 
experiment that rotational invariance need not exist in 
low level visual processing. He states, "... this experiment 
reveals a clear-cut effect of orientation on identiFication 
time. ... These results allow us to argue against a general 
model of pattern recognition based solely on the extrac¬ 
tion of ‘orientation-invariant features’.” t 

If texture is to be used in segmentation then 
orientation differences might be useful. Consider a cube 
with identically textured surfaces. When viewed, the 
main difference between the texture of the cube faces 
will be orientation, and hence orientation differences will 
play a major role in segmentation of the cube faces. 
Given these two insights, it would appear that these low 
level texture features need not be orientation invariant. 

Segmentation 

This feature extraction model can be extended to per¬ 
form rudimentary texture based segmentation. Imagine 
the output of these feature cells being coded into a grey 
level depending on which cells were maximally excited. 
Then place these grey levels in a retinotopic map and 
generate a form of two dimensional image. Segmenting 
this image using grey level techniques such as thresholds, 

f Jolicoeur 1985, p. 293 
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region growing, or edge detection is equivalent to seg¬ 
menting the original visual scene by texture. The most 
probable candidate in human vision is segmentation via 
edge detection and shape recognition. Edge detection 
mechanisms that exist at this cortical level could be very 
similar to those found in the retina. An edge at this 
level will correspond to an edge between differently tex¬ 
ture regions. 

A combination of intensity edge maps, texture 
edge maps, colour differences, motion detection, and 
binocular depth cues can all be used in separating 
objects and performing image segmentation. Marr 12 
discusses the use of multiple visual processes in providing 
accurate and stable decisions about surfaces. Each pro¬ 
cess involved in segmentation would provide its best 
information on surface boundaries, but only when all evi¬ 
dence is studied and judged for strength and weakness 
will the segmentation process decide on the final separa¬ 
tion of constituent objects. 

Summary 

The image segmentation computer of the future will 
need to use various processes to mimic human success. 
It will use intensity and texture edges, colour differences 
as well as motion and depth cues. We have proposed a 
model of the primary visual cortex in which texture 
discrimination and segmentation can be performed. 
Demonstrations of the model indicate that similar tex¬ 
tures are represented similarly and dissimilar textures 
dissimilarly. With extensions of further neural process¬ 
ing levels it may be possible to construct an effecient 
and effective texture segementation processor. 
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Figure 1 : Single Point in FORP Space 



Figure 2 : Point Moved in Orientation and Frequency 



Figure 3 : Oriental Rattan - FORP Representation 



Figure 4 : Shifted Oriental Rattan - FORP Representation 
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Feature 1 Feature 2 



Figure 7 : Simplified Model of Hypercolumn Feature Extraction 
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Abstract 


In recent years much attention has been 
given to the advantages of multiple 
resolution pre-processing methods in 
computer vision. There is strong 
evidence that the parallel extraction of 
luminance changes over different spatial 
scales also occurs in human visual 
perception. The first experiment 
confirms the strong evidence that 
responses to low resolution signals can 
be elicited as much as 100 msec faster 
than to high spatial frequency stimuli 
of the same contrast. A further 
experiment measured the reaction time to 
discriminate the relative phase of the 
higher frequency component of a 
luminance grating comprising a 
fundamental and its second harmonic. It 
was found that the decision can be made 
more rapidly when the fundamental is low 
than high frequency. On the assumption 
that the gross structure of spatial 
forms is conveyed by low spatial 
frequencies, this supports the idea that 
the substrate for the subjective 
impression that rough descriptions of 
visual forms precede detailed 
perception, is the progressive increase 
in response time with frequency, of the 
visual mechanisms implementing spatial 
fi1 tering. 
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Introduction 

Within most natural scenes, intensity 
changes occur over a range of spatial 
scales, so it has been suggested that a 
general purpose vision system may 
require some form of early 
representation that captures and makes 
this explicit. Over the past decade a 
number of multi-resolution schemes have 
been described in the image processing 
literature, C1,2,3,4,5,63. 

Over the same period and independently, 
considerable empirical evidence has been 
amassed from both psychological and 
neurophysiological research, supporting 
the idea that a form of multiple 
resolution representat ion is employed in 
the early processing stages of 
biological visual systems. It is 
asserted that at each point in the 
retina there exist several contrast 
sensitive mechanisms each detecting 
luminance changes over a different 
spatial scale. As spatial scale is 
equivalent to spatial frequency (for 
signals containing a single frequency 
component) the neural mechanisms can be 
described as filters in the spatial 
frequency domain. The simplest model of 
the filter impulse response is the 
difference of two circularly symmetric 
Gaussian functions. The output of a set 
of filters tuned to the same frequency, 
covering the entire retina is termed a 
channel, and is equivalent to the 
parallel convolution of the image with a 
single operator. Each channel is assumed 
to act independently of any other, hence 
this processing scheme is known as the 
multiple independent channels model C73. 

The independent channels model has been 
remarkably successful in predicting 
contrast thresholds in a variety of 
laboratory experiments, and in 
generating a great deal of further 
research attempting to specify the 
spatial filter characteristics. However 
little attention has been given to their 
organization, function or utility. A 
description of the properties of 
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individual mechanisms is insufficient to 
explain the fundamental problems of 
shape description and object 
recognition. 

Early papers proposed that the 
hypothesised neural filters implement 
spectral analysis. If phase information 
is discarded it would then be possible 
to perform pattern recognition with 
translational invariance, and if 
reference templates are stored and 
matched in terms of the ratio of 
frequencies, with size invariance [8,93. 
However there are several objections to 
this notion. Spatial phase is of 
cardinal importance to the visual 
system. It is still possible to 
recognise objects if amplitude 
information is discarded by equalizing 
all components, provided phase 
information is preserved C103. The one 
octave bandwidth of the filters is too 
great to permit high resolution spectral 
analysis, and furthermore, while Fourier 
Transform based methods may permit the 
recognition of simple shapes presented 
in isolation, it would be much more 
difficult to acheive more complex tasks 
such as scene segmentation in the 
frequency domain. 

Marr and Hildreth C113 propose a space 
domain function for spatial frequency 
filters, in which at each image 
location, mechanisms tuned to different 
frequencies provide independent evidence 
of luminance changes over different 
scales. This information is then 
integrated in local oriented edge 
detector units. Critical problems with 
this scheme are the alignment of the 
output from different operators, its 
performance on blurred edges and its 
performance in the presence of noise. 

In this paper the bank of spatial 
filters will be considered as a multi¬ 
resolution pre-processing front end, 
which provides a rich description of 
luminance changes over a range of 
spatial scales, upon which higher level 
interpretive processes may operate. 

An advantage of pyramidal processing 
schemes in computer vision is the 
possibility of information flow in 
several directions within the data 
structure C123. Projection operations 
permit information acquired at low 
resolution to guide processing at higher 
levels. For example in matching 
applications it seems an efficient 
strategy to rapidly discard the maximum 
number of incorrect alternatives by 
first matching at a coarse level of 
resolution, saving the more 
computationally intensive high 
resolution processs for a reduced set of 
alternatives. Similarly in shape 
analysis it is more efficient to first 
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locate the approximate boundaries of the 
form with a coarse analysis before 
focussing local operations on optimal 
regions. Data flow in the opposite 
direction involves the integration of 
high resolution information and its 
reduction to lower levels, eg. block 
quantization. Lateral processing is 
restricted to a single level. It is not 
known whether there are such 
interactions between the levels of the 
multiple resolution stucture of the 
human visual system but it is well 
established that the temporal properties 
of low spatial frequencies channels 
differ from those tuned to higher 
spatial frequencies in that the former 
mechanisms exhibit a greater sensitivity 
to temporal transients [143. Low spatial 
frequency mechanisms behave as 
derivative operators to temporal changes 
in contrast while high spatial frequency 
mechanisms operate as temporal 
integrators. Hence the possibility is 
raised that the output of each spatial 
frequency channel is produced 
asynchronously. 

Intuitively it appears that on first 
glance of a scene, we obtain an imme¬ 
diate rough impression of the approxi¬ 
mate forms, locations and extents of 
the principle objects. Perception 
rapidly becomes more detailed over time 
C143. This form of global precedence can 
be interpreted in terms of the 
independent channels model as the 
progressive acquisition of representa¬ 
tions of increasing spatial frequency. 
Immediately after stimulus presentation 
only the blurred output of low spatial 
frequency filters is available, and over 
a fraction of a second the image repre¬ 
sentation is sharpened by the addition 
of higher frequency components. It is 
not known whether this phenomenon has 
any utility in terms of neural implemen¬ 
tations of projective multi-resolution 
algorithms, or whether it is merely a 
processing bottleneck, a dysfunctional 
epiphenomenon. 

The idea of asynchronous parallel 
channels has several interesting conse¬ 
quences, two of which will be addressed 
empirically in this paper. The first is 
that the visual detection of stimuli 
containing lower spatial frequencies 
should be faster than of stimuli 
containing only higher frequencies. 
Secondly, visual discrimination should 
be faster if the stimuli differ in their 
low spatial frequency content than if 
they only differ in their high frequency 
content. 

Experiment 1 

The first experiment measures simple 
reaction time (sRT) to the presentation 
of sine wave gratings, replicating a 
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result obtained by Breitmeyer C153. The 
stimulus is the abrupt appearence of a 
luminance grating displayed on a CRT 
screen. The observer's task is to 
indicate his detection of the change 
from a blank to a luminance modulated 
screen by pressing a microswitch as 
quickly as possible. 

A very simple model of the subject's 
performance in the reaction time task 
comprises two components, a sensory, and 
a motor stage. On the abrupt 
presentation of a visual stimulus it is 
assumed that only those spatial filters 
tuned to the stimulus produce a sensory 
response, and that their response 
magnitude follows some growth function, 
increasing over a short duration 
following stimulus onset. The decision 
to press the microswitch is taken when 
the perturbation of the sensory output 
function exceeds a criterion value. The 
duration of motor response is assumed to 
be constant (approx. 100 msec), 
independent of sensory factors including 
the spatial frequency of the stimulus. 

The total sRT is the sum of the 
durations of these two stages. Any 
variation in the empirically obtained 
latencies with stimulus spatial 
frequency can therefore be entirely 
attributed to differences in the time 
taken for the outputs of the spatial 
filters to exceed some criterion level. 
The stimulus is a one dimensional 
luminace function sinusoidally modulated 
about a mean level. This is spread ver¬ 
tically on the screen by a high 
frequency oscillator to give the 
appearence of a vertical grating. 

Results for one subject at two contrast 
levels are shown in Figure 1. It should 
be possible to obtain estimates of 
sensory latency from the sRT values by 
subtracting from them the constant motor 
time. It can be seen that there is 
indeed an increase in the time course of 
response with increasing stimulus 
frequency. A response to the detection 
of a low contrast 9 cpd grating is not 
made until up to 100 msec after that to 
a 0.5 cpd stimulus of the same contrast. 
It can also be seen that reaction time 
decreases with increasing stimulus 
contrast and the extent of the spatial 
frequency based time difference 
diminishes. On the basis of the 
assumptions made above, this can be 
taken to imply that spatial filters 
tuned to higher frequencies have longer 
latencies than those tuned to lower 
frequencles. 

Experiment 2 

Having confirmed the evidence for 
temporal asynchronies between spatial 
frequency channels in a detection task, 
it was decided to investigate whether 


Reaction Time (msec) Observer LD. 
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Spatial Frequency (c.p.d) 

Figure 1. Reaction Time to sinusoidal gratings 
for one subject at 2 contrast levels 

the effect transfers to discrimination. 
Most natural visual stimuli do not 
comprise a single frequency component 
but complex spectra. Furthermore it is 
the phase rather than the amplitude 
spectrum that determines shape 
recognition C103. Therefore a choice 
Reaction Time (cRT) task was chosen 
requiring the discrimination of the 
relative phase of a two component 
compound grating. If channel asynchrony 
imposes deterministic delays on the 
transmission of information then 
decisions based on higher frequency 
information should be delayed relative 
to those based on low spatial frequency 
information. 

The stimuli used in the experiment were 
the first two sinusoidal components of a 
square wave, added in either square wave 
or triangle wave phase. The luminance at 
each point, L(x) is given by Equation 2: 

L(x> = Lm + c sin(2Pif + Phil) (Equat- 
+ c/3 sin(2Pi3f + Phi2> ion 2) 

where Lm is mean luminance, and c,f and 
Phi are contrast frequency and phase 
respectively. Their luminance profiles 
are shown in Figure 2. Observers had to 
identify each stimulus as rapidly as 
possible. The model for the performance 
of the cRT task is similar to that for 
sRT except that a discrimination process 
must intervene between sensory detection 
and the initiation of the motor 
response. A motor response can only be 
initiated after the detection of the 
third harmonic and the discrimination of 
its phase relative to the fundamental. 
Assuming the speeld of phase 
discrimination is constant with spatial 
frequency, it should not be possible to 
make the discrimination until the higher 
frequency information has reached the 
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Figure 2 Examples of the luminance 
profiles of the stimuli used 
in Experiment 2. 

decision mechanism. Therefore there 
should be an increase in choice reaction 
time with frequency that should depend 
solely on sensory delay and should 
increase at the same rate as reaction 
time to detect the high frequency 
component presented alone. 

Each stimulus was the sum of two 
sinusoids separated in frequency by a 
factor of three. In each block of trials 
the same fundamental frequency was 
always presented, but the relative phase 
of the harmonic randomly varied between 
either 0 or Pi on each trial with equal 
probability. The subject was given two 
response keys, one assigned to each 
alternative, and instructed to indicate 
the perceived phase relationship as 
rapidly as possible by pressing the 
appropriate switch. The subjects were 
also presented with the third harmonic 
stimuli in isolation in a sRT task. 

Figure 3 shows the results for 4 
subjects. The increase in sRT with the 
frequency of the third harmonic stimuli 
follows the pattern of Experiment 1. The 
results for the cRT task are plotted in 
terms of the frequency of the third 
harmonic. Phase discrimination for the 
low frequency compound stimuli appears 
to take approximately 80 msec longer 
than detection of its slower component. 
If this reflected the additional 
complexity of the phase discrimination 
decision then reaction time to the high 


Reaction Time <m»ec> 



Spatla 1 Frequency 


Figure 3. Results of Experiment 2 illustrating Reaction Time 
as a function of Spatial Frequency for 4 subjects. 

Closed circles represent mean simple Reaction 
Times to single component gratings. 

Open circles represent mean choice Reaction 

Times for correct responses in the speeded phase discrimination 

task. The symbols are plotted against the frequency of. 

the third harmonic component of the compound stimulus. 

frequency compound would be expected to 
show a similar additional delay. However 
the surprising result is that the the 
relative phase of the third harmonic can 
be discriminated as rapidly as the 
detection of the same stimulus presented 
in isolation. 

It appears that the slope of the cRT 
function is more nearly parallel to that 
of the sRT function if plotted in terms 
of the frequency of the fundamental, 
rather than the third harmonic. The 
shallower slope follows the shallower 
increase in sRT with frequency at lower 
spatial frequencies and higher 

contrasts. One possible explanation for 
the anomalous result of Experiment 2, 
therefore is that the performance of the 
task is based not upon discrimination of 
relative phase but upon discrimination 
of the peak to peak amplitudes of the 
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Figure 4 Examples of the luminance 
profiles of the stimuli used 
in Experiment 3. 


Reaction Time (msec) 



Spatial Frequency 


Figure 5. Results of Experiment 3 illustrating Reaction Time 
as a function of Spatial Frequency for 2 subjects. 

Closed circles represent mean simple Reaction 
Times to single component gratings. 

Open circles represent mean choice Reaction 

Times for correct responses in the speeded phase discrimination 

task. The symbols are plotted against the frequency of. 

the second harmonic component of the compound stimulus. 


waveforms. Although the contrasts of the 
two frequency components are the same in 
both phase relationships, the ratio of 
the difference of image maxima and 
minima between the square wave and 
triangle wave is 1.41. So subjects could 
have been responding to the stimulus on 
the basis of global contrast rather than 
relative phase. Therefore it was decided 
to repeat the experiment with a 
condition in which peak to peak contrast 
is control led. 

Experiment 3 

In this experiment the stimulus is a 
compound of a fundamental and a second 
harmonic of equal contrast. The the 
luminance at each point is given bv 
Equation 3 : 

L.(yi = (_m ■+• c sin(2Pifx + Phil) (Equat — 

4 c sin'2Pi2f x + Phil) ion 3) 

The luminance profile of stimuli in the 
two phase relationships and two 
fundamental frequencies are shown in 
Figure 4. It can be seen that global 
contrast cannot be used as a cue as the 
stimuli are related by a reflection. sRT 
to the second harmonic stimuli presented 
in isolation was also measured. 


The results for two subjects are shown 
in Fig 5. The increase in cRT is 
parallel to the increase in sRT with 
spatial frequency. Thus when global 
contrast is removed as a cue there does 
indeed appear to be an increase in the 
time taken to perform a discrimination 
task when the decision has to be based 
upon the high rather than low spatial 
frequency content of the stimulus. 

Discussion 

Experiment 1 provided strong evidence 
for the idea of asynchronous detection 
of spatial frequency components when 
presented in isolation. Low frequency 
stimuli can be detected up to 100 msec 
faster than high frequency stimuli of 
the same contrast. That contrast also 
increases detection time further 
supports the idea of global precedence, 
as the amplitude spectra of most natural 
stimuli are low pass. Thus the idea of 
asynchronous channel operation can be 
viewed as providing a computationally 
explicit explanation of the intuitive 
observation of global precedence. 

Assuming it is possible to predict the 
detectibi 1 ity of any stimulus at a 
constant mean luminance given a 
knowledege of the frequency spectrum of 
the stimulus and the observer's 
modulation transfer function, it should 
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similarly be possible to predict visual 
conspicuity and the growth of detail 
perception over the first few moments of 
inspection of any static visual 
stimulus, from a knowledge of its 
Fourier spectrum and the observer's 
reaction time performance. The results 
of Experiment 3 are also encouraging, 
illustrating that frequency related 
detection latency differences transfer 
to a discrimination task. 

However the analogy between computer 
vision and human multi-resolution 
systems is far from perfect. One major 
problem is raised by the results of 
Experiment 2. While there is good 
evidence for individual spatial 
frequency components being detected 
independently at threshold contrast, the 
evidence is not compelling for 
suprathreshold stimuli. Assuming that 
the channels have a bandwidth of 
approximately 1 octave it should not be 
possible to integrate energy from two 
frequency components separated by a 
factor of 3. That this appears to have 
occurred in Experiment 2, enabling 
global contrast to be used as a cue in 
the discrimination task, implies that 
channel bandwidth in some way increases 
with stimulus contrast. If this is so it 
seriously weakens the idea of 
independent channels. 
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ABSTRACT 

Correspondence matching in apparent motion is 
based on two heuristics: match images if they 1) 
have a similar form and 2) are in close proximity. 
Psychophysical experiments are used to define 
these heuristics. Observers judged motion path 
between images in a competition paradigm. Results 
showed that the tokens used in form matching are 
spatial frequency and orientation. Further, prox¬ 
imity is defined in a 3-D spatial reconstruction 
rather than 2-D retinal coordinates. A possible 
representation for the computation of corres¬ 
pondence is a multidimensional detector space, 
with dimensions including spatial frequency, 
orientation, X, Y and Z (or disparity) 
coordinates. 

INTRODUCTION 

A remarkable property of biological vision 
systems is the ability to deduce that two images, 
seen at different places and/or times, represent 
the same physical object. The advantages of this 
property are nicely exemplified in the phenomenon 
of apparent motion: when viewing a series of 
static pictures, or "frames", each object in one 
frame moves to the location of the corresponding 
object in the subsequent frame. Coherent motion is 
perceived only if the visual system, after 
considering successive frames, can properly match 
images corresponding to the same object. This 
"correspondence problem" is presumably solved by 
application of heuristics to provide a "preference 
metric" (13) which evaluates the affinity between 
potential matches. Preference metrics can be 
derived by two general classes of heuristic: 1) 
match images of similar form and 2) match images 
with the greatest spatial proximity. At first 
glance, this recipe for matching seems simple 
enough, but real difficulties arise when imple¬ 
mentation is attempted. These heuristics need to 
be more precisely defined by answering the 
following questions. First, what form primitives 
are used as tokens in correspondence matching? 
Second, is proximity defined in two-dimensional 
retinal coordinates or in an internal, 3-D 
reconstruction of space. This question has 
important implications since use of a 3-D metric 
requires that a depth must be assigned each form 
token before matching can proceed. The studies 
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described below are addressed to answer each of 
these questions. 

THE FORM HEURISTIC 

The notion that correspondence matching is 
based partly on form similarity has been around 
for a long time. However, it has proved suprising- 
ly difficult to identify correspondence tokens 
since apparent motion tends to be independent of 
form similarity. Early studies (8, 14) found that 
when there was only one image in each frame, the 
apparent motion seen with two identical images was 
readily perceived with two different images. The 
first would deform gradually into the second with 
no loss of motion continuity. More recently 
experimenters (3,11) have used competition methods 
and have likewise concluded that form similarity 
plays no role in token matching. 

Why has it proved so difficult to identify 
correspondence tokens? Two possible explanations 
come to mind. First, stimuli were either geometric 
forms, circles, squares, letters, etc. or alpha¬ 
betic characters, which differ in high spatial 
frequency content, but are similar in low spatial 
frequencies. Both human psychophysical (1) and 
computer (9) experiments have resulted in the view 
that one representational stage in early visual 
processing is the activity in arrays of detectors 
which are sensitive to edges at different 
resolution. At each resolution level, the 
detectors are activated only by a narrow band of 
spatial frequencies. Geometric shapes and alpha¬ 
betic characters would all stimulate similar 
populations of coarse, low resolutions detectors. 
If activity in detectors at different resolution 
were tokens, then there would be strong affinity 
between all such images. Second, previous 
investigators have used a flash technique which 
produced a luminance transient (i.e., a D. C. 
offset) accompanying the presentation of form. The 
luminance flux per se might be used as a token for 
matching. This seems plausible because it has been 
suggested (4) that the visual system contains two 
parallel sets of detectors for analyzing spatio- 
temporal luminance change. The detectors are 
modeled as difference of Gaussians (DOG ? s) with 
different temporal properties. If the inhibitory 
Gaussian is developed simultaneously with the 
excitatory, then the detector is "sustained" and 
is highly tuned to aspects of form such as spatial 
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frequency and orientation. If inhibition is 
delayed, the detector is "transient”, responds to 
D. C. flux and shows little selectivity to form. 
Correspondence might be determined by matching 
patterns of activity in these transient detectors. 

I tested these possibilities in a set of 
psychophysical experiments (5). The initial 
assumption was that correspondence matching is 
mediated by the activity of detectors tuned to 
edges of different resolution and orientation. My 
strategy involved patterns which would be much 

more selective in the populations of detectors 
that were being stimulated. To eliminate the 
problem of common low frequency components, I used 
Gaussian modulated sinusoids or "Gabor functions". 
The spatial frequency content of Gabor functions 
is narrow and easily controlled by varying the 
period of the sinusoid. Luminance changes were 
eliminated by insuring that the time and space- 
averaged luminance of the Gabors were equal to 
that of the background. 

Stimuli were displayed on a Hitachi high 
resolution monitor driven by a Grinnell graphics 
system. The viewing area was 14 by 12 degrees and 
had a mean luminance of 65 cd/m2. When no stimuli 
were being displayed, the screen was uniform in 
luminance with the exception of a central cross¬ 
hair provided for fixation. 

Targets were Gaussian modulated sinusoids, or 
"Gabor functions". These were created by calculat¬ 
ing a sine-wave function, which varied around the 
mean luminance, and then multiplying the sinusoid 
by a circular Gaussian. The final product appeared 
as a circular patch of sine-wave about 1.7 degrees 
in diameter in which contrast was maximum at the 
center and decreased radially. Unless otherwise 
stated, phase of the sinusoid was 0 degrees with 
respect to the Gaussian function. This was 
necessary to insure that the space-averaged 
luminance of the Gabor would always be the same as 
that of the background. Contrast of the Gabor 
functions was determined by a matching procedure. 
The Gabor containing the highest central 
frequency, 10 c/deg, was set to 85% contrast. 
Physical contrast of all other Gabors was set to 
the same apparent contrast. 

Experiments consisted of a series of trials 
in which the observer viewed a sequence of 4 
frames. As shown in Figure 1, each frame contained 
four Gabors drawn on the circumference of an 
imaginary circle. Frames consisted of two pair of 
identical Gabors ("A" and "B"). In the 
experiments, A and B represented different values 
along the dimensions of spatial frequency or 
orientation. Figure 2 shows a picture of the 
actual display. In this case A and B differ in 
spatial frequency by 1.5 octaves. Frames 2 through 
4 consisted of the same stimuli rotated by 45 
degrees to new positions. Rotation changed only 
position, not orientation of the Gabors. The 
correspondence problem asks how the Gabors in 
frame 1 decide which Gabor in frame 2 is a proper 
match. Since the distance from A to B or another A 
is equal, there is no a_ priori reason for motion 
to be clockwise or counter-clockwise. If the 
difference between A and B can be used to 
determine correspondence, the A moves to A and B 
to B. Otherwise, direction should be ambiguous. 

The observers 1 task in all experiments was to 
discriminate clockwise from counter-clockwise 
motion. On each trial, the sequence of 4 frames 







Figure 2 

was shown twice in succession to produce rotation 
through 315 degrees. Frame duration was 84 msec (5 
sweeps of the raster) and the interstimulus 
interval (ISI) between frames, during which only 
the uniform field was visible, was 17 or 50 msec. 
The only reason for choosing these time intervals 
was that they produced clear motion. The results 
reported below were robust and did not depend 
critically on any particular frame duration or 
ISI. 

In the first set of experiments, A and B 
Gabors of different spatial frequencies. The left 
panel of Figure 3 shows the results obtained when 
A was fixed at 1.7 c/deg and the distance between 
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the centers of similar Gabors was 5.4 degrees. The 
actual distance between Gabors in successive 
frames was 2.3 degrees. This meant that there was 
no overlap in the position of a Gabor from one 
frame to the next. Ability to perceive direction 
of motion was at chance levels when A and B had 
the same value. As spatial frequency of B 
increased, discrimination between clockwise or 
counter-clockwise directions improved until 
performance was perfect. Observers reported that 
their ability to judge direction resulted from a 
coherent motion of the Gabors in a circular path. 
Data in the bottom panel show results obtained 
when spatial frequency of A was fixed at 5.0 
c/deg. Clear spatial frequency tuning of the 
correspondence process is again evident for both 
observers. The tuning of the matching process is 
suprising sharp. I estimated that the curves fell 
to half-width/half-height in 0.5 to 1.0 octave, a 
value similar to that found for cells in the 
primary visual cortex and many other psycho¬ 
physical experiments. I repeated the experiment 
with smaller diameter circles to produce different 
amounts of overlap between Gabor positions in 
successive frames. There was no evidence that 
matching was affected by whether or not overlap 
existed between successive frames (5). 



SPATIAL FREQUENCY ( C/DEG ) 

Figure 3 


I earlier speculated that previous studies 
may have failed to find correspondence tokens 
because of the luminance flux which accompanied 
form presentation. To test this possibility, I 
repeated the basic experiment except that the 
background luminance was dark (actually 0.5 cd/m2) 
or half that of the Gabors (32.5 cd/m2). Observers 
failed to perceive strong coherent motion under 
any conditions. 

I also investigated the possibility that 
orientation may be a token for correspondence. For 


this experiment spatial frequency was set at 3.0 
c/deg and orientation difference between A and B 
varied. As shown in Figure 4, correspondence 
exhibits a clear tuning for orientation. Diff¬ 
erences of 22.5 degrees from the A value produced 
almost perfect performance. Although performance 
was excellent with orientation as the corre¬ 
spondence token, observers agreed that the 
coherence of the motion produced by orientation, 
although clear enough to make a correct judgment, 
was never as smooth as for spatial frequency. 



ORIENTATION (DEG) 

Figure 4 


I concluded from these experiments that 
matching is based on similarity of spatial 
frequency and orientation. This contrasts greatly 
with the previous finding that form similarity 
seems to be important in apparent motion. My view 
is similar to that of Ullman (13), who has 
suggested that the many failures to uncover form 
tokens occurred because experimenters used 
relatively complex images. Shapes such as alpha¬ 
betic characters consist of numerous tokens, so 
that any two images will usually have some tokens 
that match. By using simpler stimuli, isolated 
line segments, he found that orientation was an 
important token in matching. My conclusion differs 
slightly in that it is not the oriented lines that 
are important but rather the activity in 
populations of oriented, narrowband detectors. My 
analysis further differs from that of Ullman since 
I suggest that luminance transients play a large 
role in correspondence. The existance of 
transients may explain why orientation is not 
always found (3) to be a token, even with single 
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lines. 

Some (2) believe that there are two 
mechanisms of motion correspondence, the "short- 
range" and "long-range" systems. Ullman also 
concluded that orientation was a token only when 
matching spanned very small spatial separations 
and stimulated short-range mechanisms. The step 
sizes used in my experiments were relatively large 
and likely activate long-range mechanisms. An 
additional demonstration shows that spatial 
frequency is also a token in short-range motion. 
Observers again viewed a series of 4 frames which 
were continuously recycled. Each frame contained 
either a 1) field of uniform luminance, 2) sine- 
wave grating, 3) square-wave grating or 4) missing 
fundamental-wave grating (MF) in whiqh.the sine is 
subtracted from the square. The MF wave looks much 
like the square-wave in that it contains an 
alternating series of sharp-edged bars. If frames 
1 and 3 contain square-waves at 0 and 180 degrees 
and 2 and 4 are blank, then only flicker is 
perceived when the frames are cycled. If frame 3 
contains a sine or square at 90 degrees and frame 
4 a sine or square at 270 degrees, then the bars 
appear to march leftward. This presumably occurs 
because both sine- and square-waves contain the 
same fundamental which can be used to compute 
correspondence. If the wave in frames 2 and 4 is 
an MF, then only flicker is perceived, even though 
the patterns in all 4 frames appear very similar. 
This presumably occurs because the MF does not 
contain the low frequency component of the sine 
and square. 

This demonstration of dramatizes an important 
point: perception is based on processes occurring 
in a hierarchy of representations, but we "see" 
only the final product. The activity in spatial 
frequency and orientation selective detectors are 
used for correspondence matching but also provide 
the imput to higher levels where feature and 
object descriptions are created. Our ultimate 
perception is that of moving objects containing 
features to which we have conscious access. This 
has led many experimenters to look for corre¬ 
spondence tokens in various feature or object 
domains. I believe this effort failed partly 
because correspondence is achieved at one level of 
representation while the features and objects were 
constructed at another. Too many experimenters 
employ images which indiscriminantly active 
detectors operating at low levels of represent¬ 
ation. This is bound to obscure analysis of visual 
processing. The visual scientists version of 
Occam’s razor is that phenomena ought to be 
explained at the lowest possible level. 

THE PROXIMITY HEURISTIC 

The second unresolved question is whether 
matching employs two or three dimensional prox¬ 
imities. Initial studies (12) suggest that 
matching is based on 2-D proximities. In these 
studies, linear perspective was used to produce 
depth separation. I reexamined this conclusion 
using another depth cue, disparity (6). 

The display was similar to that described 
above, except that each frame consisted of random 
dot stereograms, with A and B being disk-shaped 
submatrices of different disparity. Each disk was 
1.3 degrees in diameter and lay on an imaginary 


circle with a 1.8 degree diameter. Figure 5 shows 
how the display appeared to the observers. The 
pairs of disks seemed to float in front of the 
background at different depths. A small red square 
of 0 disparity provided at fixation point. 

Observers viewed a series of 8 such frames in 
which the disks’ positions were rotated by 45 
degree steps. If correspondence matching is based 
on 2-D proximity in the XY plane, then direction 
of rotation is ambiguous: frame 2 contains two 
possible matches equidistant from each object in 
frame 1. If 3-D proximity is used as the distance 
metric, then objects will appear to move to the 
neighbor in the same depth plane and therefore 
closer in 3-D space. 



Y 


Figure 5 

When viewing the sequence of frames, 
observers readily perceived clear rotational 
motion with disks moving to neighbors at the same 
depth. Figure 6 shows results from an experiment 
in which disparity was between pairs of disks was 
varied symmetrically around 0 disparity. Each data 
point represents the percentage of times direction 
of motion was toward the neighbor at the Same 
apparent depth. For large disparities, almost all 
judgments were consistent with the 3-D interpre¬ 
tation. At a disparity of 0, since there were two 
equidistant neighbors, direction was ambiguous, 
and fell to chance. I repeated some of our 
observations with displays where the radius of the 
circle was larger (2.5 degrees) and smaller (1.2 
degrees). Similar results were obtained. 



0 3 6 12 18 24 
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Figure 6 
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In the experiment described above, images had 
no monocular cues. We also were forced to use long 
durations since the disks would dissolve into the 
background if the motion were too fast. 

We repeated the experiment with the light squares 
of the disks darkened so that the observers saw 
gray blobs floating against the background. This 
produced monocular cues and allowed us to use a 
higher frame rate, 84 msec per frame. The new 
display also produced a compelling circular motion 
with disks moving toward neighbors in the same 
depth plane, and observers were almost 100% 
correct in discriminating the motion path. To be 
sure that the monocular cues did not contribute to 
ability to judge direction, observers attempted 
the task with one eye occluded. Results showed the 
observers performing at chance. 

Our results are consistent with the view that 
correspondence matching utilizes 3-D proximities. 
In a subsequent experiment, we further 
demonstrated that distance in the X, Y and Z 
planes can be traded off so that objects will 
appear to move in depth when the nearest neighbor 
lies at a different disparity. Based on our 
evidence, it might be expected that 
correspondence matching by computer would be 
enhanced by assignment of depth/disparity to 
images in each frame. This is confirmed by Jenkin 

(7), which found that disparity information was 
effective in producing more accurate matching. 

COMPUTING CORRESPONDENCE 

To compute correspondence both a represent¬ 
ation and an algorithm must be specified. A poss¬ 
ible representation is suggested by the studies 
described above. Images would be represented by 
the activity of a multidimensional detector array, 
where each dimension represents a "primitive 
continuum’ 1 . These are dimensions on which 
detectors are selective, responding only to a 
narrow range of values. Detectors may be 
continuously mapped or the dimension can be 
resolved into discrete segments to facilitate use 
of algorithms such as a Hough transform. If the 
dimensions are resolved, degree of resolution 
might be suggested by psychophysical studies. For 
example, the first experiments indicate that 
spatial frequency could quantized into 0.5-1.0 
octave steps. 

A metaphor for this array is an n-dimensional 
space where each detector occupies a single 
location. The detectors are blobs, rather than 
points, since detectors have a non-zero bandwith 
in most dimensions. Important dimensions, 
suggested by the studies described above, include 
spatial frequency, orientation, X, Y and Z 
coordinates (or possibly disparity, depending on 
whether other depth cues are employed). Sustained 
and transient detectors would have to be 
separately represented. There may be other 
important dimensions as well. For example, I have 
been examining whether color might be added to 
this list, but the results so far have been 
ambiguous. 

To perform correspondence matching the 
activity of points at time tl is compared to 


activity of points at _t2. A correspondence 
algorithm might try to match up similarly tuned 
detectors, i.e., nearest neighbors in the 
detector space. Tne preference metric then becomes 
an index of proximity in the space and would be 
calculated by summing the (weighted?) proximities 
in each of the n dimensions. To accomplish this, 
the metric of the detector space must be found. In 
the simplest case, dimensions are independent and 
the metric is "city-block". Proximity is then 
simply a sum of the distances in each dimension. 
However, it is more likely that the dimensions are 
not independent so that computing proximity would 
not be so simple. For example, if the metric were 
Euclidean, then distance would be derived from 
taking the square-root of the sum of squares in 
each dimension. The situation could be even more 
complicated if different planes have different 
Minkowski metrics. Once the spatial metric is 
known, then matching can proceed by any of the 
standard algorithms, such as relaxation labeling. 

However, given the detector space 
representation, there are many possible variations 
to the scheme outlined above. For example, some 
(10) suggest that each resolution channel be 
computed independently while others (11a) believe 
that cross-channel correspondences should be 
determined first. However the correspondence is 
computed, it apparently must consider a low level 
representation rather than one in feature or 
object domains. 
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Abstract 

In this paper we examine the idea that 
texture segmentation comes about by the 
differential outputs of detectors 
(non-linear associative filters) 
computed at each resolvable position on 
the textured surface. Further, we 
consider some of the conditions under 
which "primary" detector outputs are 
dynamically compared and associated to 
develop into a smaller set of "texton" 
profiles which capture the predominant 
differentiating features of the texture 
regions. Comparisons to human 
psychophysical results are made. 

KEY WORDS: texture segmentation, 
associative networks, orientation 
detectors, adaptability 

1. Introduction. 

For a biological visual system 
endowed with a multitude of cells which 
apparently act as feature extractors or 
filters, it seems reasonable to presume 
that visual texture segmentation may 
come about by the differential responses 
of such detectors over the textured 
region. This proposal has received 
experimental and mathematical attention 
over the past decade with 
one-dimensional grey-scaled textures 
(Richards, 1979? Harvey & 
Gervais, 1978) and two-dimensional 
textures (Caelli & Julesz, 1978; Caelli, 
1982, 1985). However, only until 
recently has a full computational model 
been proposed which produces 
segmentation as a function of such 
"texton" (Julesz, 1981) outputs, and 
this paper extends the above analyses in 
a number of ways (Caelli, 1985). 

Here texture segmentation is viewed 
as having three component processes: 
(1) spatial decomposition, (2) dynamical 
associative processing, and (3), 
classification of textured regions. The 
specific aims of this model are to 
enable segmentation when the textures 
consist of sparse micropatterns; to 
create networks which will extract, or 
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adapt to, the predominant features of 
the texture; and to use a classification 
procedure which is adaptive to the 
outputs of such detectors. 

2. The Model. 

2.1 Level I processing: Spatial 
decomposition and activity 
profi Jes. 

The initial process of texture 
segmentation is envisaged to involve the 
registration of the input (foveal) 
texture through the parallel outputs of 
many detectors whose responses are 
determined by some non-linear 
transformation of their cross 

correlation with the input. Assuming a 
relatively fixed "retinal pre-processor7 
having opponent center-surround 

receptive fields, the primary 

information to be processed must have 
differential, or band-pass, components 
emphasized. Further to this, we assume 
the existence of a relatively fixed 
primary projection area where such image 
derivative information is further 
classified (encoded) by cortical edge 
and bar detectors whose outputs are a 
non-linear function of the 

cross-correlation of the detector’s 
profile with the input image. That is, 
we assume that the response R^(x,y) of a 
detector d^ at retinotopic position 
(x,y) is determined by: 

d ^ (a, 6) = const. (i) 

ot, 6 

and Rj (x t y) = const + YiHc^oI}, Y=constant. (2) 

o denoting cross-correlation between the 
detector and image (I) 

d.ol(x.y) = £ d . (a, B) I (x+a,y+3). (3) 

1 a,3 

and 

l-e' 62 

-1 < (j;g(z) = - £+1,6 = constant. (4) 

~ 1+e 

In our simulations we have used 
R i (x,y) = 128 + 127 i|>6{d.oI}, 6 = 0.03,(5) 
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to fit in with an 8-bit response range. 
The non-linear transducer enables one to 
move smoothly from square wave ("ideal" 
edge and bar) to gaussian modulated 
sinusoid (Daugman, 1983) representations 
for edge and bar, or orientation 
detectors, via 6 in (4). Orientations 
and sizes of the detectors were chosen 
to fit with a large number of 
experimental results on human texture 
discrimination showing the inability of 
observers to resolve image orientations 
to better than ±5° (Caelli, 1982; Beck, 
1983). With evidence that such 
receptive, or "perceptive? fields are 
limited in size to ±1/8 octave, or 1 1/2 
cycles to 1/e decay of a gaussian 
aperture, we have generated 24 
fundamental orientation detectors over 
7x7 pixel kernels (relative to 128x128 
pixel textures) satisfying these profile 
constraints for both edge (odd) and bar 
(even) detectors. 

We secondly assume that the 
response profile for each detector is 
'rectified* into an "activity" profile. 

A.j (x ,y) = |R i (x,y)-const | , const = 128 . (6) 

2.2 Level II procesing: Adaptive control 
and selection of critical features . 

In contrast to representing texture 
codes by detectors defined at different 
size scales, in gaussian pyramids, etc. 
(see Burt & Adelson, 1983), which would 
be capable of responding to texture 
regions in areas greater than the actual 
micropattern size—the approach adopted 
here remains at the resolution of the 
basic texture—though this is not a 
necessary restriction. Further, the 
process of texture region "filling-in" 
(impletion) is seen as a dynamic process 
involving the iterative activity of 
activated detectors in terms of how 
their responses may spread over 
contiguous regions in a summative 
(averaging) fashion. This is analogous 
to relaxation in image processing 
(Hummel & Zucker, 1983) where the 
strength of a given spatial response is 
reinforced or inhibited as a function of 
neighbouring collaborative or opposite 
evidence. In particular, it is assumed 
that the activity of a given detector d, 
determined by (6) is updated dynamically 
by the following (associative) "texture 
processing equation": 

t+l# ^ t ® 8 n . . 

A i lx,y)~ T. T. A (x + a — ,y+ 0 —) + Z w. . A^x.y) 

aB a 8 2 2 j = l J * '/ 

i+j 

where (a,0) corresponds to the "region 
of influence" at each iteration. W^.: is 
the coupling, or associativity, between 
two activity profiles which can either 
be fixed or adaptive. For the fixed case 
we have used the well-known 
mass-action" formulation for detector 


coupling (Grossberg, 1982), where we 
have set: 


vt. 


w *. a 

W ij 


f k...(i,j) being (edge, bar) pairs at the 
J same orientation 


/1 \ k otherwise. 

N 


( 8 ) 


for n detectors and k being the coupling 
strength such that 


Vi, 


n 

Z 

C 


ij 


(9) 


In general, it seems unlikely that 
the (neural) connectivity between such 
detector planes can be defined by a 
single stationary matrix W^j over space 
and time. Like Hebb (1949) and Fukushima 
(1984), we assume that the process of 
perceptual learning and adaptation 
involves the dynamic updating of W^j as 
a function of the detector's response 
strengths and correlations. For these 
reasons, our interests are also focused 
upon investigating formulations for Wjj 
such as: 

»!j ■ * b <o l 

where and bjj correspond to slope and 
intercept regression coefficients of Aj 
on A;, p to the degree to which this 
correlated information is combined with 
the response of A^ to result in "new" 
detector profiles. This dynamical system 
converges to strong "attractor" 
detectors which (as will be shown) have 
receptive fields related to the first 
few eigenvalues of the coupling matrix. 
Using (10) in (7) requires normalization 
as: 


*(x.y) 


Cl« r?J 
J »1 1j 
1fj 


Y. zf(x*a,y+8) 

a8 1 




(ID 


for 0<p and even. The first component: 

_L E zhx+a,y+B) 

a 8 a3 

being an averaging process (moving 
spatial window), is clearly "local" and 
restricted to a given detector plane. 
That is, activity within a given plane 
spreads as a . function of the 
neighbouring activity of the same 
detector type and converges to a mean 
level of activity. Secondly, this 
activity is reinforced as a function of 
the degree to which other detectors 
exhibit similar graded responses over 
the full texture regions—a global 
cooperative component —represented by 
the last component in (H): "synergesis? 


This has the effect of combining 
correlated detector responses and 
converging to common ("attractor") 


Graphics Interface ’86 Vision Interface ’86 



345 


profiles, so reducing the number of 
different detectors. Again, it should be 
noted that the solutions are critically 
dependent on the input signal. In the 
case of texture segmentation, where 
broad spatial regions have to be 
"labeled", inhibitory forms of W ;4 seem 
inappropriate as they differentiate the 
detector responses in further ways. 
This, in turn, would not produce the 
percept of contiguous spatial regions, 
and would be more useful in pattern 
recognition where it is precisely these 
differentiated dimensions which are 
needed. 


Finally, we consider the network to 
"complete" its activity when it reaches 
near equilibrium state; as measured by: 


1 

naB 


a,B 

l 

x,y 


A* + 1 (x,y)-A^(x,y) 

A^(x,y) 


< <$, 


( 12 ) 


6 being near zero (in our case 6=0.02). 
Here n corresponds to the number of 
detectors and to the image size. 

It should be noted that formulation 
(11) is an example of an associative 
network whose coupling undergoes 
adaptation, and, if we consider the 
problem of texture segmentation as 
primarily involving the extraction of 
the main dimensions for segmentation, 
then it is the eigenvalues of which 
are critical. Further, we could also 
claim, like Kohonen (1977) and Anderson, 
Silverstein, Ritz and Jones (1977) that 
—the network associativity at time t 
is J the primary attribute of the model 
rather than the detector states, per se. 
However, the author feels these claims 
to be too strong since both {A^} and 
{Wij} are mutually dependent. However, 
in J the simulations to be reported we 
shall observe the behaviour of the 
eigenvalues of W^j to investigate how 
these adaptive processes are changing 
the dimensionality of the problem. 

2.3 Level III processing: Region 
classification and decision criteria . 

Since textured regions are proposed 
to appear as a function of position 
response differences in "feature space? 
the appropriate classification process 
seems to be the minimum distance 
classifier (MDC, Ahmed & Roa, 1975). 
This method determines whether a pixel 
falls into one of two textured regions 
as a function of whether it is closer to 
the centroid of the texture or not. The 
MDC determines the discriminant 
(function) hyperplane which constitutes 
the locus of points equidistant betwen 
both centroids, and of the form: 


g(a. 




lj 


5 2j* 3a j 


1 n 2 


vy'j 3 


(13) 


where (a^.a^) correspond to the feature 
dimensions--m this case detector 
outputs, ajj corresponds to the mean 
value for gr6up (texture) i on feature 
], while aj corresponds to a given input 
texture pixel feature weights. The pixel 
is classified as a function of the sign 
of g(a 1 ..a n ). 

To introduce a degree of 
"fuzziness? or "segmentation strength? 
into this procedure, it would be 
adequate to use the distance between the 
means over the standard deviation of 
both sets (or average t statistic): 

t = i r t., for t. _ a li" a 2i (14) 

11 i = 1 1 1 r _ _z 

S 2 /n.+S 2 /n 2 
J li 1 2i 


where n 1f n 2 correspond to the number of 
pixels in each textured region, sJj to 
the appropriate variance statistic. 

This function not only indicates 
that adding common features to the 
textures would decrease perceptual 
segmentation, but would also decrease if 
more variability in detector outputs was 
observed over either, or both, regions. 

3. Simulations and Conclcusions. 

We first summarize the main 
properties of the model: 

(T,) Detector activity is determined by 
the rectified response profiles as a 
result of detector cross-correlation 
with the incoming texture, according 
to (1)-(7). 

(T 2 ) The activity of a given detector at 
time t and position (x,y) is 
determined by the degree to which 
neighbouring regions are also active 
with respect to this detector and 
the activity of others. 

(T 3 ) The associativity between detector 
arrays (i,j) is adaptive to their 
responses. 

(T ft ) A perceptual classification is made 
after this system of dynamically 
responding detectors reaches 

equilibrium. 

(T 5 ) Classification of positional 

information into textured regions is 
accomplished by a weighted form of 
the minimum distance classifier, 
weighted by the total texture 
"entropy? 

We have implemented all three 
proposed processes (convolution, 
impletion/cooperativity, and 

classification) to quantitatively 
observe the behaviour of the system with 
four critical texture pairs consisting 
of grey-scaled textures differing in 
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granularity, simple textures differing 
in orientation, and those differing in 
micropattern space characteristics: T,L, 
etc. We have chosen these latter two 
pairs since it has been (a) shown that 
they differ in discriminability, and 
(b) it has been proposed that they 
require "end-of-1ine" and intersection 
detectors to discriminate (Julesz, 
1984)—the latter we can disprove. These 
are shown in Figure 1 together with the 
outputs of the classification procedure, 
using (13) to represent the relative 
"strength” of discrimination: 
Convergence usually occurred within 5-7 
iterations. 



Figure 1. 

Input textures (left column) and 
segmentation resulting from the outputs 
of 24 detectors with no associativities 
(second column) r associativities as 
defined by (11) (column three) and 
inhibitory mass action (column four , 
equation 9): K=0.05 in (8). Contrast 
reflects segmentation strength according 
to equation (14). 


To illustrate the effects of 
associativities on decreasing the 
dimensionability of the classification 
process, Figure 2 shows the eigenvalues 
for the solutions shown in columns two 
and three of Figure 1. Such reductions 
in "dimensionality" are clearly related 
to the iterative process converging to 
common strongly active detector profiles 
and inhibiting less active and isolated 
ones. 




Eigenvalues for the non-associative 
(solid lines: column 2 , Figure 1) and 
asociative (dashed lines: column 3, 
Figure 1) segmentation processes . T^ to 
T u correspond to the 4 textures shown 
up Column 1 of Figure 1. 


What connects the texture 
processing equation (7) and the "texton" 
approach is that such cooperative 
networks decrease the dimensionality of 
the problem to the more strongly 
active—though "adaptively 
generated"—detectors or dimensions. 
That is, the profile of each detector in 
the process described by (8) is not 
stationary but, rather, is adapted by 
the energy it is designed to process and 
the activity of other units. Indeed, the 
actual profile at any time is 
recoverable by inverse filtering . 

This model for texture segmentation 
is algebraically similar to a class of 
models for pattern recognition based on 
the associative (coupled) activities of 
large numbers of computational units 
whose activity profiles adapt to the 
signal and network states (see Kohonen, 
1977; Fukushima, 1984). The main 
difference lies in how each 
computational component is interpreted, 
and the involvement of a classification 
scheme at the end, which actually 
produces the textured regions. In this 
sense the model is not formally 
dependent upon the initial edge and bar 
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detectors chosen, but rather on the ways 
in which their outputs are correlated 
over space and time according to (7). 

In the present texture processing 
model the nature of the decomposition, 
and so dimensions, of a given texture 
segmentation task is dependent on the 
signal and the type of coupling 
operating between the computational 
units. If the visual system (or, indeed, 
the scientist) were to choose detectors 
which satisfied absolute orthogonality 
(di*d-;=0, * being convolution) then, 
from a mathematical perspective the 
ideal detector conditions would be 
present and the cooperative processes 
defined by (11) and (12) would not be 
required. However, the Impletion process 
would be involved, along with the 
classification algorithm. However, one 
assumption here is that the visual 
system is not that precise in creating 
detectors which, a priori, are so 
independent. Rather, the idea is that 
the visual system converges on the 
central detector profiles by adaptation 
processes like those described 
here—being signal dependent and network 
specific. 

In conclusion, then, we have 
extended an earlier model for texture 
segmentation initially related to the 
"heuristics’ 1 of Julesz and Bergen ( 1983) 
for "preattentive" and "attentive" 
visual processes in spatial vision. The 
model has three components: 
decomposition (via cross-correlation), 
local and global processing, and 
classification. Though many questions 
still remain unanswered, our results 
suggest that these mechanisms, in a 
psychophysical sense, represent the 
types of processing involved in texture 
segmentation. The main result here is 
that the enumeration of detector 
profiles is but one part of the texture 
discrimination process and that the 
detector profiles "attended to" by the 
visual system are signal dependent and 
not fixed and invariant over all texture 
types, but also resultant from the 
underlying cooperative processes which 
generate "texton" classes to optimize 
the classification process by as few 
dimensions as possible. The proposed 
model does not solve the apparent 
rotation invariance processing 
characteristics of texture 
micropatterns, nor does it propose only 
one form of cooperative process. In this 
case we have found that the global 
process defined by (7) is efficient m 
reducing the dimensionality of the 
classification problem and have proposed 
that such coupling cannot be inhibitory 
if some form of "f illing - in" is 
required. 


References 

Ahmed, N. & Rao, K. (1975). Orthoqonal 
Transforms for Digital Signal 
Processing . Berlin: Springer. 

Anderson, J.A., Silverstein, J.W., Ritz, 
S.A. & Jones, R.S. (1977). 
Distinctive features, categorical 
perception, and probability 

learning: some applications of a 
neural model. Psychological Review. 
85, 413-451 . —-- 

Burt, P.J. & Adelson, E.H. (1983). The 
Laplacian pyramid as a compact image 
code. IEEE Transactions on 

Communications . COM-31 (4). 532-540~ 

Beck^ J\ (1983). A theory of textural 
segmentation. In Jack Beck (Ed.) 
Human and Machine Vision . New York: 
Academic Press, 1-38. 

Caelli, T. (1982). On discriminating 
visual textures and images. 
Perception and Psychophysics. 31. 
149-159. -- 

Caelli, T. (1985). Three processing 
characteristics of visual texture 
segmentation. Spatial Vision. 1(1). 
19-30. - ~ 

Caelli, T. & Julesz, B. (1978). On 
perceptual analyzers underlying 
visual texture discrimination: 
Part 1• Biological Cybernetics . 28 , 

167-175. 

Daugman, J. (1983). Six formal 
properties of two-dimensional 
anisotropic visual filters: 

Structural principles and 

frequency/orientation selectivity. 
IEEE Transactions on Systems, Man, 
and Cybernetics , SMC-13 , 882-888. 

FukushimaT, k 7 (19847"! A - * hierarchical 

neural network model for associative 
memory. Biological Cybernetics , 50 , 
105-113. 

Grossberg, S. (1982). Studies of the 
Mind and Brain: Neural Principles of 
Learning, Perception, Development, 
Cognition and Motor Control. Boston: 
Reidel. 

Harvey, L.O. & Gervais, M.J. (1978). 

Visual texture perception and 

Fourier analysis. Perception & 

Psychophysics , 24, 534-542. 

Hebb, D.O. (1949). The Organization of 
Behavior , New York: John Wiley. 

Hummel, R. & Zucker, S. (1983). On the 
foundations of relaxation labelling 
processes. IEEE: PAMI , 3, 267-287. 

Julesz, B. (1981). Textons, the elements 
of texture perception and their 
interactions. Nature , 290 , 91-97. 

Julesz, B. (1984). Toward an axiomatic 
theory of preattentive vision. In 
Dynamic Aspects of Neocortical 

Function , G. Edelman, W. Gall, 
W.M. Cowan (Editors). 

Julesz, B. & Bergen, J. (1983). Textons, 
the fundamental elements in 
preattentive vision and perception 
of textures. Bell Syst. Tech. 
Journal, 62, 1619-1645. 


Graphics Interface ’86 Vision Interface ’86 



348 


Kohonen, T. (1977), Associative 
memory—a system-theoretic approach . 
New Yorks Springer. 

Richards, W. (1979). Quantifying sensory 
channels: Generalizing colorimetry 
to orientation and texture, touch 
and tones. Sensory Processes , 3, 

207-229. 


Graphics Interface ’86 


Vision Interface ’86 



349 


PARALLEL ARCHITECTURES FOR MACHINE VISION 


Steven L. Tanimoto 


Dept, of Computer Schience 
University of Washington 


ABSTRACT 

The thrust toward parallel processing in machine 
vision has been unusually intense for several 
reasons: there is a tremendous amount of 

parallelism inherent in algorithms for image 
processing; the spatial regularity of image data 
structures lends itself well to VLSI implementa¬ 
tions; and programming for parallel systems is 
probably better understood in the context of 
image processing than in any other realm of ap¬ 
plication. The SIMD variety of parallel computer 
systems matches well with many machine-vision 
problems, and these systems permit large amounts 
of parallelism to be applied to a problem with 
relatively little programming effort and re¬ 
latively high efficiency. 

Parallel architectures for machine vision may be 
classified according to the dimension across 
which they achieve their parallelism. Most com¬ 
mon are those that achieve parallelism across an 
image; their different processing elements treat 
different parts of the image simultaneously. 

Other architectures are pipelined, performing at 
any one time the different steps of an algorithm 
on part of a stream of pixels. The image-parallel 
and the pipelined architectures do not exhaust the 
list, for there are more dimensions for parallel¬ 
ization: across goals, across pixels, and across 
neighborhoods, to mention several. 

Some parallel architectures can take advantage of 
more than one dimension of parallelism. The 
"pyramid-machine": architecture is one of these. 
This architecture combines features of meshes such 
as the "CLIP4" and the "Massively Parallel 
Processor" with tree machine capabilities, yield¬ 
ing a system with considerable flexibility and 
efficiency. Some features of a pyramid machine 
under study at the University of Washington are 
described. Algorithms for pyramid machines have 
been investigated by several research groups, and 
they can be classified according to the control 
and data-flow paradigms they follow. 

A challenging problem exists today in developing 
architectures that can bridge the "iconic-to- 
symbolic gap". The high-performance architectures 
for vision that currently exist focus their ef¬ 
forts on either the pixel-level operations or 


symbolic operations quite well, but they are not 
efficient in transforming information from iconic 
form or vice-versa. Some of the proposals for 
bridging this gap are discussed. 

KEYWORDS: machine vision, image processing, 
parallel processing, architecture, pyramid 
machine. 
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ABSTRACT 

We propose a framework for image flow analysis 
consisting of three major stages; i.e.: 

- determine moving edges by some local process; 

- integrate motion information along linked contours 

- propagate motion estimation through homogeneous 
regions obtained inside these contours. 

This paper is concerned with the two first sta¬ 
ges. A procedure, based on some local modeling and 
maximum likelihood scheme, bas been designed to per¬ 
form the first step. After some linking process, 
constraints provided by the measurements gained from 
the first stage can be combined to compute the velo¬ 
city field along contours, by minimizing some simple 
functional. To this end, a gradient algorithm is 
used with a recursive estimation from one point to 
its successor in the chain. 

RESUME 

Nous proposons un schema d'obtention du champ 
des vitesses dans une sequence d'images s'articulant 
en trois etapes, a savoir : 

- determiner localement les elements de contour en 
mouvement; 

- integrer 1'information de mouvement le long des 
lignes contours chainees; 

- propager 1'estimation du mouvement a l'intdrieur 
des zones homogenes delimitees par ces lignes con¬ 
tours. 

Le papier traite des deux premieres etapes. Une 
procedure, basee sur une modelisation locale et un 
critere de maximum de vraisemblance, a ete congue 
afin de realiser le premier point. Apres chainage, 
les mesures issues du premier niveau peuvent etre 
combindes afin de calculer le champ des vitesses 
complet le long des lignes contours via la minimi¬ 
sation d'une fonctionnelle simple. A cette fin, un 
algorithme de gradient est mis en oeuvre avec une 
recurrence de point en point le long de la chaine 
contour. 

KEYWORDS: image sequence, moving edge determination, 
motion estimation, local modeling, maximum likeli¬ 
hood test, stochastic gradient. 
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I INTRODUCTION 

Image sequence analysis has received more and mo¬ 
re attention since 70's. In particular substantial 
studies have been concerned with motion estimation 
accross changing two-dimensional images. Two main 
motivations have subtented there research efforts. 
First, motion computation represents an attractive 
challenge in order to design some robust, tractable 
and general-purpose method. On the other hand, ap¬ 
plication areas never stop broadening, |l|. 

Meteorological applications (determining wind 
fields owing to cloud motion estimation, 12 |) , mi¬ 
litary domain (target tracking) were among pioneer 
ones. Then came interframe image coding for broad¬ 
cast television or videoconferencing purpose | 3,4|. 
For a few years, other potential applications have 
appeared: biomedical (e.g., angiocardiography |5|,) 
robotics (mobile robot, ]6 |) , traffic monitoring, 
graphics ... These new domains are not only interes¬ 
ted in two-dimensional motion as it is, but as in¬ 
trinsic features conveying information about the 
depicted 3D-scene. Indeed motion in the imaging pla¬ 
ne provides primary cues to relative depth, struc¬ 
ture and 3D-movements of objects in space 17,8|. 

The motion in the imaging plane is usually re¬ 
ferred to as the "optic flow". Optic flow can be 
represented as a vector field: the field of appa¬ 
rent velocities of brightness patterns in the image 
due to relative motion of camera and objects in 
space. (As one's uses a dicrete representation of 
an image sequence, displacement vector fields and 
velocity vector fields are usually confused, al¬ 
though mathematically of different nature). 

Discriminating discontinuities in the velocity 
field is a key problem in motion estimation sche¬ 
mes whatever they are. Indeed, feature-based methods 
require cooperative matching procedures |9|, and 
gradient-based methods involve some smoothing cons¬ 
traint 110,11 | . Thus, we have designed a method 
whose first task is to cope with these discontinui¬ 
ties, which are tied to contours in the image, such 
as occluding contours, joint ones ... 

We propose a framework for image flow analysis 
consisting of three major stages; i.e.: 
determine moving edges by some local process ; 
link these eges and integrate motion information a- 
long contours • 

propagate motion estimation through homogeneous re- 
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gions obtained inside these contours. 

The two first issues are addressed in this paper. 

The first step can be considered as an early pro¬ 
cessing whose output contains location and spatial 
direction of an edge element and component of its 
displacement in the direction perpendicular to the 
local orientation of the edge. It is well-known that 
only such partial motion information can be reached 
by local operations (this point is often referred 
to as the aperture problem), A maximum likelihood 
method, based on some local modeling has been desi¬ 
gned for this purpose. 

Then constraints provided by these measurements 
gained from the first stage can be combined to com¬ 
pute the velocity field along contours, if however 
variations in spatial orientations occur along such 
contours. This computation results from the minimi¬ 
zation of some simple functional by a stochastic 
gradient algorithm. 

II LOCAL DETERMINATION OF A MOVING EDGE 
II.1 - Modeling of a moving edge 


within II 2 # where c^ and C 2 are two different 
constants. 

The orientation of the planar patch can be defi¬ 
ned by the two following angles: 0 (w.r.t. to the 
x-axis) and ip (w.r.t. to the t-axis) as illustrated 
in Figure 1. The component V*L of V perpendicular to 
the spatial 2D-edge and projected in plane t=tQ, is 
given by: =tan rju It is obvious that only this 

component V*^ can be inferred from the local deter¬ 
mination of this planar patch. Note that the case 
of a static edge belongs to hypothesis ; V= (0,0,1) 
and \p=0. Indeed such an edge will be considered as 
a "moving" edge, whose displacement is zero. 

The problem now is how to select one hypothesis 
versus the other one. The test in order to decide 
between these two hypotheses will be designed using 
some maximum likelihood scheme. 

II.2 - Maximum likelihood test 

Details in mathematical developments can be 
found in 112 |, concerning the maximum likelihood 
test designed for detecting moving edges along with 
estimating their parameters. It is expressed by: 


An image sequence is considered as a 3D-space 
(x,y,t). A spatial 2D-edge in an image is modeled 
as a small local linear segment. Hence a moving 20- 
edge is locally modeled as a small planar patch in 
the spatio-temporal 3D-space (x,y,t). The direction 
0 (w.r.t. the x-axis) of the 2D-edge centered in 
(xQ,yo) in the xy-plane at time t Q and its velocity 

$ = ( 


dt 


dt 


1) determine the orientation of 


this planar patch (see figure 1). This planar mode¬ 
ling is equivalent to the first order approximation 
that most gradient-based methods take into account. 



Figure 1 : Local modeling of a moving edge as a pla¬ 
nar patch . 

0 6 [o,n[ ; ip G [o,n/2[ 

Let us consider an elementary volume II, in the 
3D-space (x,y,t), located around point £=(x Q ,y Q ,t Q ). 
Two hypotheses (or local configurations) can be ac¬ 
ting : 

H q : there is no spatio-temporal edge inside II; then 
the intensity distribution within II is modeled 
as CQ+b, where c Q is a constant and b denotes a 
zero-mean Gaussian noise with variance a 2 . 

: there is a spatio-temporal edge inside II, i.e. 
a small planar patch P splitting II between two 
sub-regions, 11^ and n 2 • Then the intensity dis¬ 
tribution is modeled as: c^+b within 3Ti; C 2 +b 


max max min LRV £ X (1), 

-M, <l> C 1' C 2 c o 

where LRV is the log-ratio of likelihood functions 
Lj and Lq, respectively associated with hypotheses 
and Hq. The likelihood function is merely the 
joint probability density function of the intensi¬ 
ties within elementary volume II. It is easily deri¬ 
ved as Gaussian distributions are involved and in¬ 
dependent intensity random variables are assumed. X 
is a predetermined threshold. 

Clearly, hypothesis is selected if the obtai¬ 
ned maximum value of LRV, is greater than X. Then 
gne can conclude that a moving edge is located at 
£ with spatial direction^0 and 'perpendicular" velo¬ 
city V- 1 * =tan ifi, where £, 0, are precisely values 
of (t,Q,\p) which have satisfied the mentioned cri¬ 
terion (1). 


Yet one problem arises. No analytical closed- 
forms^can be derived to express the optimal estima¬ 
tors 0,iji corresponding to the geometrical characte¬ 
ristics of the model. Thus a predefined set of gi¬ 
ven configurations , j=l,... , G, will be considered 

For a given geometric configuration (0j,i|/j), the 
optimal estimators c^(i=0,l,2) concerning intensity 
aspects satisfy: 


8LRV (£,$.) 

ac. J = 0 

1 

which leads to 

V n pin f(p> '• C l = pln i f(p> ; C 2 = n^ pln 2 f (p) (2) ’ 

where f (p) are observed intensity values within II, 
n (resp. n ,n^) is the number of points within II; 
(resp.n 1 , Jl 2 ) . 


II. 3 •• Computational scheme 

It turns out, 112 | , that maximizing LRV comes to 
maximizing the following expression: 
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CRV (£,$.) 

3 



|c r c 2 l 


(3) 


Using (2) and (3), we can write function CRV(£,$j) 
in the form: 


CRV(£,$.) 

3 


L a.(m) f(£+m)| 

m6M j ' 


(4) 


where M is a set of vectorial indices such that 
{Z+ m, m€M} represent all points of volume II, and 
coefficients aj's only depend on a predefined confi¬ 
guration $>j. Indeed, the computational implementa¬ 
tion of this local estimation process merely con¬ 
sists of convolution operations. 

The process for determining moving edges between 
two successive images can be summarized as follows. 
For each point Z in the first image: 

. calculate LRV(£,$j) for each mask {aj} correspon¬ 
ding to geometric configuration 

. select orientation ^ which maximizes function 
LRV; if LRV(£,$ k )£A, then a moving edge is said to 
be potentially present at point Z, whose estimated 
parameters are $j_ and associated likelihood value 
denoted by CONF (Z) =LRV(£,$ k )? else no moving edge 
is determined and CONF {Z) is set to 0. 


For each previously selected point Z 
- If CONF(£)>CONF(£ 1 ) and CONF(£)>CONF (Z 2 ), where 
points Z\ and Z 2 are the two neighbours of Z in 
the direction perpendicular to 0^, then 

conclude that a moving edge is^located at Z=Z, 
whose parameters are given by $=4> k . 

This last step could be interpreted as a thinning 
procedure. Indeed it corresponds to the local maxi¬ 
mization of CRV subject to location parameter Z as 
expressed in (1). 


If the volume II intersects I images, CRV can be 
decomposed into: 

CRV<£,* ) = | J m l a(m.)f(£ + m.)| 

J 1 1 


= 1 iL CRVi It.*)) 


(5) 


where is a set of vectorial indices such that 
{Z+ m if m i 6M i } represent all points belonging to vo¬ 
lume II and image i. Hence, this approach can embrass 
with the same formalism cases where two and more i- 
mages are considered. 


An additional heuristic is introduced to avoid 
false detections. Before concluding that a spatio- 
temporal edge is present at point Z according to the 
criterion (1), the following constraint must be sa¬ 
tisfied: 

|i 1 ^ |CRVi | / | CRVi | ^ |i 2 , for all ieix{l} 

where and |i 2 are two predetermined thresholds. 

More complex modeling, including for instance 
circle arc or rotation component, could also be 
handled by the same method. This will only lead to 
other sets of masks to be considered in expression 
(4). 


One advantage of this method is to present no 
inherent restrictions concerning kinds of edges li¬ 
kely to be successfully handled (in particular, oc¬ 
clusion boundaries) and concerning measurable motion 
magnitude. The same does not hold for differential 


methods. The extent of measured motion is directly 
constrainted by the smoothing extent used to compu¬ 
te the spatial gradient of the image intensity. The¬ 
refore, the differential approach is more appropria¬ 
te for small displacements. Moreover, flow fields 
are often incorrect near occlusion, since assump¬ 
tions required for differentiation do not hold any 
longer in such areas. On the other hand, this maxi¬ 
mum-likelihood technique can be CPU-time consuming 
with a general-purpose computer, but CPU-time can 
be drastically reduced if an array processor is u- 
sed. 


Ill COMPLETE ESTIMATION 
OF THE VELOCITY FIELD ALONG CONTOURS 


In the previous section, a procedure has been 
described which detects moving edges and, at the sa¬ 
me time estimates their local spatial direction and 
component of their velocity perpendicular to the 
contour. The goal of the second stage is to compute 
component of velocity vectors tangent to the con¬ 
tour. 

In order to achieve the second stage of the ima¬ 
ge flow analysis, edge linking is perequisite. To 
this end, only local spatial directions of detected 
moving edges are taken into account. One-pixel gaps 
can be filled up. The linking technique is similar 
to the one presented in |13|. Then, we get a set of 
contours, i.e., a set of chains of linked spatio- 
temporal edges. 

To compute the entire velocity field along the 
contours, the second stage of analysis must combine 
the local measurements yielded by the first stage. 
This combination stage is efficient if enough va¬ 
riations in spatial orientation occur along obtai¬ 
ned contours. For instance, a straight line contour 
remains a singular case. 

Let cu = (uj ,a) ) = ( dx r dy ) be the restriction 
x y dt dt 

of ^ = ( ft ' ft ' 15 t0 the plane (x ' y) ‘ Let us 


consider e 


£ (— 



( 6 ) 


where oj is the velocity field to be estimated, n^ 
is the unitary vector normal to the local edge ele¬ 
ment at point Z, n£ =(-sin0£,cos0£), Vf is the mea¬ 
sured perpendicular component of velocity at point 
Z . e£(w) is supposed to be a stationary random va¬ 
riable. 

Then, the measurement of velocity field w along 
a given contour C is formulated as the minimiza¬ 
tion of the following function: 

J(u>) = E c (e £ U)) 2 (7) 

where IE denotes expectation. Motivations for such 
a criterion can be found in 1141. A stochastic gra¬ 
dient algorithm is used to minimize J(w).The recur¬ 
sive estimation is pursued from one point to its 
successor in the chain. More precisely, it is ex¬ 
pressed as follows: 

—£+1 = ^ * 7 « e £ ( H £> * 

where y is a gain matrix, and 
ent. 


W (8) 

V denotes the gradi- 
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V 

to 


e l 




and 


x 

— («) = n £ = -sin 0 £ ; 


spatial directions: 0=0°, 30°, 60°, 90°, 120°, 
150° and eleven possible perpendicular displacement 
magnitudes, = -5,...,-1,0,1,...,5 (G=66, R=6, 

Q=11). Submask size is 5x5 pixels. The estimated 
perpendicular displacement field is presented in 
figure 3 along with spatial edges, for an image 
part., 


9e « 

— («) = = cos 6 £ . 

y 


The initial estimate can be given by to^V^* n^ 


A theoretical proof of such a minimization formu¬ 
lation is shown in 115 |. The obtained convergence 
is quite fast. Two recursion cycles around a given 
contour C are usually sufficient for a proper esti¬ 
mation of the velocity field {a^,£ec}. Moreover, 
two recursions are performed in parallel, clockwise 
and counter-clockwise. Then, an average is computed 
between the two estimates at each point Z. Smooth¬ 
ness constraint is not explicitly formulated in the 
minimization criterion as in 116 | , but it is ensured 
by the recursion along the contour. 


IV RESULTS 


An evaluation of the correctness of the resul¬ 
ting estimation is available. By means of some other 
tool, motion is known to be three pixels to the left 
in the whole image except for subparts corresponding 
to hung linen, some bushes. Quite satisfactory re¬ 
sults are obtained. 

The second example includes two images of a prin¬ 
ter acquired by a CCD camera (Figure 4). Only the 
printer has been moved from one image to the next, 
camera and background remain fixed. 84 masks have 
been considered, that is to say four possible spa¬ 
tial directions 0=0°, 45°, 90°, 135°, and 21 pos¬ 
sible perpendicular displacements V 1 = -10,...,-1, 
0,1,..., 10. (Of course, metric is adapted when di¬ 
rections other than horizontal and vertical are con¬ 
sidered) . Determined moving edges are shown in Figu¬ 
re 5 with their perpendicular displacement, which 
can eventually be none. 


IV, 1 - Results concerning the first analysis stage 

Experiments on computer-generated images have 
been performed in order to warrant the estimation 
method of moving edges. Different kinds of motion 
have been considered (translation of the camera a- 
long its view axis, object rotation in the image 
plane). Results are presented in |12|. 

The algorithm has also been applied to actual i- 
mages. Only two successive images are considered for 
each example reported here. Hence each mask corres¬ 
ponding to coefficients aj's for each predefined 
configuration $j,j=l,...,G, divides into two sub¬ 
masks. These G masks are computed once G geometric 
configurations are chosen. Then they are available 
when images are processed. Choosing angle \Jjj is. e- 
quivalent to choosing displacement magnitude V j in 
the direction perpendicular to the local linear ed¬ 
ge element whose spatial direction is 0j. Therefore, 
the location of the second submask in the second i- 
mage with respect to location of the first one cen¬ 
tered at current point Z in the first image is gi¬ 
ven by V*^j n j . 

Then, the G configurations can also be denoted as 
{ ((0 r ,Tj>g) ,q=l ,Q) ,r=l ,R} with RxQ=G. Thus the func¬ 
tion CRV can be written as follows: 

CKV (£, $ r q) = I Jjj. CRVi U, $ rq ) | 

= |cRvi(£,e r )+ iei^ijCRviCM^) | O) 

In order to save CPU-time, convolution operations 
with the whole set of masks, as previously explai¬ 
ned, are not actually computed for each point Z. If 
CRVI(£,0 r )<aA, faith e.g. a=0.25), computations cor¬ 
responding to the evaluation of CRV (Z, $ r q),q=l,Q, 
stop and CRV(£,$ r gj, q=l,Q are set to 0: 

The first example is extracted from a natural se¬ 
quence acquired by a camera and depicting an urban 
scene. Figure 2 shows the first image. The set of 
masks consists of 66 masks including six possible 


IV. 2-Results concerning the second analysis stage 

Two sets of experiments are presented involving 
computer-generated images including a single poly¬ 
gonal object and two kinds of motion: uniform trans¬ 
lation and in-plane rotation. Superimposed silhou¬ 
ettes of the object in two successive positions are 
shown for both cases, respectively in Fig. 6a and 
7a. Of course, the method is not restricted to such 
cases. 

In Fig. 6b and 7b is drawn the perpendicular dis¬ 
placement field. It has been estimated using the 
algorithm presented in this paper. 

Fig. 6c and 7c show the resulting complete displa¬ 
cement field after two recursive estimation cycles 
around the boundary. The last example points out 
that varying displacement field can be successfully 
handled. 

V FUTURE WORK 

Future research directions mainly include: 

- corner displacement estimation (as complementary 
processing after linking) 

- detection of possible motion boundaries along con¬ 
tours in parallel with the recursive estimation 
(e.g., this may happen if a boundary portion is an 
occlusion one) in order to reinitialize the recur¬ 
sion. 
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Fig.6a : Superimposed silhouettes of polygonal 
object 1 
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Fig.6b : Perpendicular displacement field 



Fig.6c : Resulting complete displacement field 
Y = /O.03 0.01\ 

\0.01 0.03/ 
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Fig. 7 a : Superimposed silhouettes of polygonal 
object 2 
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Abstract 

A computation procedure for detecting 
arbitrary two-dimensional signals 
embedded in scenes and independent of 
orientation/size is developed using 
pipe-line pixel processor procedures. 
Signals are encoded as edge-only 
features and cross-correlated with 
edge-only versions of the input 
scenes—both in cartesian and log-polar 
coordinates. These processes are 
incorporated into a robot visual system 
capable of locating, moving towards, and 
pointing to a target signal, again, 
independent of its size and orientation. 

KEY WORDS: robot vision, pattern 
recognition, edge extraction, invariance 
coding 

1. Introduction. 

Many claims have been made as to 
the importance of edge information in 
coding images (Marr, 1982), pattern 
recognition (Rosenfeld & Kak, 1982) and 
fast computational vision in general. 
Secondly, convergent results from human 
psychophysics (Watt & Morgan, 1983), 
vertebrate physiology (Pollen, Andrews & 
Felden, 1978) and computional edge 
processing (Marr & Hildreth, 1980? 
Leclerc & Zucker, 1983) point to the 
"optimal" edge extractor as the logical 
intersection of band-pass Gaussian 
filters or pseudo-gamma function 
band-pass filters (Marr & Hildreth, 
1980) approximated by V 2 G a operators 
(Watt & Morgan, 1983; Leclerc & Zucker, 
1983): a Laplacian operator following 
low-pass Gaussian filters of specified 
bandwidths. In this paper we are 
concerned with using edge information 
for pattern recognition or pattern 
matching, as restricted to the 
two-dimensional environment—though 
including the problem of matching 
independent of signal orientation and 
size—required in our robot pattern 
recognition system (Figure 1). 
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Edge-based matching techniques are 
particularly useful in the context of 
pipe-line pixel processors where the 
execution time for convolution 
operations is critically dependent on 
kernel size, and the frame memories are 
restricted to 8- or 16-bit pixel sizes. 
Though our general model involves the 
correlation of edge-only pattern 
features at various levels of resolution 
according to a strict Laplacian pyramid 
format, in the present simulations we 
have restricted our analyses to the 
highest level of resolution: the 
original 512x512 input format. 

A typical pipe-line pixel processor 
(Arithmetic Logic Unit: ALU-512) maps 
frame memories into frame memories with 
respect to the logical operations of 
(OR, exclusive OR, AND, 2 f s complement) 
and usual addition and subtraction 
operations. Each pass takes 33 msecs and 
operates on the full image, except when 
pixel protects are active. Information 
passes through a 16-bit register and the 
device also has an 8x8 bit 
multiplier—which enables convolution 



Figure 1 

Robot and restricted visual 
environment used in simulations . Various 
image montages were placed about the 
walls and the robot's task was to detect 
where a specified signal was , move 
towards and point to it f based on visual 
informat ion . 
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operations on 512x512 images with 
nxm-sized kernels, and so taking nx/77x33 
msecs per convolution. 

The robot was controlled by 
interrupt-driven subroutine calls 
capable of moving it in any horizontal 
direction and operating a three-joint 
arm according to the processed image 
information, as shown in Figure 1. Both 
image processing (Imaging Technology) 
and robot (RB Robots, Colorado) systems 
were controlled by a PDP 11/23 computer 
operating in RT11. 


Though various options are 
available for extracting the image edges 
as (x,y) coordinates from the V 2 G 
images, we have used the following 
technique, since it only involves two 
passes through the pipe-line pixel 
processor. Since the output of the 
V 2 G(a) operator can be positive or 
negative, 127 was added to the output, 
followed by an .AND. with 128, giving 
the complete form: 


(E(x.y)} = (all z( x ,y )>0, from [G 2 [V 2 Gi 0 (I ) + 127]]fU28} (6) 


2. Edge-extraction, invariance codes, 
and matching techniques. 

As implied above, the V 2 G(a) 
operator (on an image I) is an adequate 
representative of band-pass filtering 
used in recent edge-extraction 

techniques. Defined by 

V 2 6(a)(I) = V 2 G(a:x,y)(I) + v 2 G(a:x,y)(I) m 
aY 7 - ‘ ~TP - v ' 


where G(a:x,y) = e - a2 (( x - x °) 2 + (y-yo) 2 / (2) 


Here (x 0 ,yo) defines all convolution 
centers over the image. This filter is 
simple to approximate in a pipe-line 
pixel process by two convolution 
kernels. First, the Gaussian low-pass 
filter (2) can be approximated by the 
recursive use of an averaging kernels of 
the form: 


A 


1/4 


1 1 
1 1 


(3) 


After n recursions, it produces the 
bivariate binomial coefficients on an 
(n+1)x(n+1) kernel as: 


G - B(n;x,y) 


_ (n! ) 2 

xly:(n-xj:(n-y)!), x,y =0,..n 


(4) 


This takes nx132 msecs to complete and 
in the simulations to be reported here n 
was set at 10 (a total of 33x4x10=1.32 
sec). 

The VJ or Laplacian kernel is 
defined (in finite difference form 2 ) by 


o 


v 2 


l 

0 


0 

1 


0 


(5) 


taking 9x33=297 msec to complete. 


Here, also, a second smoothing operation 
(G 2 ) was employed to suppress isolated 
non-zero points resultant from the V 2 Gio 
operation. An illustration of these 
processes is shown in Figure 2, taking 
a total time of 1.75 sec. In our 
application the edge set of points (6) 
was (spatially) uniformly sampled to 
less than 256 points (rectangular area 
shown in Figure 2a), as shown in 
Figure 4b- 



Figure 2 

(a) Input image (I and 
(b) edge-only version created by: 
G 2 (V 2 Gi 0 m+f27)n 128 (rectangular area 
in (a) corresponds to signal used in 
matching process , Figure 3). 

Once this edge code has been 
established for a given signal, 
cross-correlation of the signal with any 
arbitrary scene can be reduced to a 
number of passes through the pipe-line 
processor equal to the number of signal 
points. That is, the cross-correlation 
function 
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C sg (x >y) = J | s(a,B)g(x+a,y+6)dade 


. *h *h 

reduces to c (x, y ) = e e s(n,m) g (x+n, y +m) , ( 8 ) 

n=x^ m=y e 

(x 7 , yj r Xfr, y^) corresponding to the 
suo-array containing the signal. For 
both s and g being binary valued 
(edge-only) images ( 8 ) reduces to 


/v 

C (x,y) = l s(n,m)g(x+n,y+m). (9) 

iJ n,meE(x,y) 

This correlation function can be 
executed by adding the image g to itself 
shifted by the locus of poTnt~ (x,y 
values) depicting the signal , or 


c (x,y) = I g(x+n,y+m) 
9 n,meE(n ,m) 


( 10 ) 


since s.g= 1 if, and only if, both signal 
and image edge components are present. 
( 10 ) is readily executed via a series of 
passes through the pipe-line pixel 
procesor where the results are 
accumulated—as illustrated in Figure 
3bu 



Figure 3 

Edge-on 1 y match i ng (cross- 
correlation) between sampled signal (a: 
see Figure 2b) and image (b). Luminance 
corresponds to the likelihood of signal 
presence. 


The problem with this matching 
technique, however, is that it is not 
invariant to size and orientation 
changes of the signal in the image. This 
is a particularly relevant problem in 
the case of robot vision studied here 
since movements of the robot towards the 
field wall introduces size and possibly 
small angular changes in the signal 
(Figure 1). 


Techniques for matching which 
overcome rotations of signals have been 
recently developed by Hsu et al . 7 using 
circular harmonic functions whereby a 
Fourier decomposition of the polar 
transformed signal and rotated version 
is enacted along the orientation 
parameter. Matching occurs since both 
images have identical power spectra, 
with the phase giving the orientation 
differences between them. The problems 
associated with these techniques are 
( 1 ) determining the initial estimate of 
the signal and image centers to enact 
the polar transforms, and (b) that the 
Fourier transform needs to be computed. 

An alternate approach to the 
problem is simply to transform edge-only 
images into log-polar coordinates and 
enact matching using pixel processor 
operations. Again, this process has the 
problem of estimating the image center 
for the log-polar transform. For both 
signal and image the best a priori 
initial estimate of the center is the 
peak of the cartesian cross-correlation 
functions; though we are at present 
investigating a pyramid search technique 
to improve the accuracy and the speed of 
estimating the log-polar centre. 

The log-polar (conformal) mapping 
is: 


r'= 77.8 log e (r 0 +l) ( 11 ) 

and 0 =tan" 1 (Y~Y o/x-x 0 ) 
where 

r 0 = / (x-x 0 ) 2 + (y-y 0 ) 2 ( 12 ) 

for (x 0 ,y 0 ) corresponding to the peaks 
of signal autocorrelation and image 
cross-correlation images. 

Figure 4 shows an an example of 

signal and images transformed by the 
above procedure. Here, the peak of the 
log-polar cross-correlation function 
determines the particular signal size 
and orientation detected—with respect 
to the center chosen by the cartesian 
cross-correlation procedure. 

(a) 
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(b) 



Figure 4 

(a), (b) show cartesian edge-only 
image and sampled signal respectively f 
while (c) T (d) show corresponding 
log-polar versions of (a), (b). 
(e) Shows log-polar cross-correlation 
image where the peak defines the 
orientation and size of the signal 
embedded in the size which best matches 
the input signal. 
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Though the basic computational 
vision procedures are defined and 
illustrated, a number of related 
problems to implementing these ideas 
into a pseudo-real time robot visual 
system must be resolved—particularly 
related to the analysis of error, 
setting of thresholds and adaptation 
procedures. Even in the restricted 
environment used in these simulations 
ambient luminance introduces photon 
fluxuations such that even redigitizing 
exactly the same image does not 
necessarily result in identical pixel 
values, albeit they are close. Of 
particular variability is the luminance 
projections around the field walls about 
which the robot moves. 

For these reasons a signal matching 
threshold was set by digitizing the 
signal and then redigitizing the same 
area and determining the edge-only 
cross-correlation function. The peak 
value could then be used to estimate the 
signal match threshold—which in these 
simulations was set at 60% the number of 
signal points. Due to the rounding 
errors associated with the log-polar 
transform, the polar signal matching 
threshold was typically set at 
two-thirds that of the cartesian. 
Although these thresholds seem 
remarkably low, the checking procedure 
resulted in no false alarms in our 
restricted simulations—though more 
detailed signal detection analysis is 
being pursued. 

5. Conclusions. 

In these simulations we have 
demonstrated how pipe-line pixel 
processor technology may be effectively 
employed to enact edge-only matching of 
arbitrary signals to images. Though to 
this stage the choices of thresholds are 
arbitrary, our results using the 
log-polar mapping procedure for matching 
under rotations and size change along 
with the cartesian procedure resulted in 
successful matches for alphabetic 
characters and relatively complex 
images. 

This log-polar matching procedure 
falls down when no evidence of a match 
occurs—that is, no discernible peak 
occurs in the cross-correlation 
function. In other attempts to solve 
this problem 9 an adaptive procedure is 
used to establish the center. However, 
this is time consuming in that the 
procedure involves changing between 
polar and cartesian coordinates. The 
second limitation of the above procedure 
is the simple use of signal sampling to 
keep the number of signal pan and scroll 
values (coordinates) below 255. This 
does not necessarily result in the 
signal features which are more likely to 
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remain invariant to light fluxuations, 
image distortions, etc. However, these 
problems are currently under 
investigation, and the matching 
procedure and criteria are being 
analyzed from a signal detection 
perspective. Indeed, if pipe-line pixel 
processes could enact log-polar 
cross-correlations for every possible 
center, then these problems would be 
solved. 
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ABSTRACT 

The construction of knowledge-based systems of a 
size large enough to be useful has led to problems 
of knowledge acquisition. A way of solving this 
is to enable the computer to automatically generate 
its own knowledge from sets of sample data. This 
becomes further complicated when the sample data 
may have errors or noise in it. 

This paper describes a system that generates 
knowledge in the form of rules from uncertain data, 
in the domain of computer vision. The way in which 
the uncertainty arises and is processed is dis¬ 
cussed, and some sample results are presented. 

KEYWORDS: machine learning, fuzzy sets, computer 
vision. 


INTRODUCTION 

The construction of knowledge-based systems of 
a size large enough to be useful has led to problems 
of data acquisition. Expert systems have relied on 
the interaction between a knowledge engineer and a 
domain expert to produce a set of rules that cap¬ 
ture the expert*s knowledge on a particular topic. 
This process is very time-consuming, and as the 
size of the knowledge base increases it becomes a 
limiting factor. In computer vision, this method 
has the additional problem that the language 
necessary to represent the rules is not well- 
defined. The information given to any vision 
system is usually in the form of pixels, but 
formulating rules in terms of pixels is computa¬ 
tionally expensive and would be difficult for a 
programmer to understand. A higher-level represen¬ 
tation language is required in order to bring down 
the computation cost and to aid comprehension. 

This paper addresses the subject of ‘machine 
learning from examples*, or equivalently of auto¬ 
matically generating rules to describe a concept 
from examples and counter-examples of that concept. 
Desirable properties of such a generation process 
are ease of inclusion of additional problem- 
specific knowledge, and ease of comprehension by a 
user or programmer. The representation of the 
examples and rules is hence of primary importance, 
since to a large extent this will determine the 


range of situations that can be expressed, and the 
manipulations it is possible to perform. 

The problem of interpreting uncertain data 
has received considerable attention from people 
building expert systems, e.g. MYCIN (Shortliffe 
Buchanan 1), but the problem of learning rules to 
describe uncertain data has been studied less. In 
computer vision there is uncertainty due to imper¬ 
fect image processing and noise. Here this has 
been modelled by the technique of fuzzy sets. 

EXAMPLES AND COUNTER-EXAMPLES 

Objects are made up of sub-objects called 
‘primitives*. The primitives have properties that 
are called ‘attributes*, and there are connections 
between the primitives which are expressed as 
relations. For the purposes of computer vision, 
these primitives are the regions, and the attri¬ 
butes may be properties such as shape, size and 
colour; the relations are 2-D spatial relation¬ 
ships such as ‘above* or ‘surrounds*. This 
representation corresponds to a semantic net or 
graph. 



Here shape, size and height are the unary 
descriptors used and these take values of, for 
example, shape=triangle, size=medium and height=6. 
This illustrates the use of two types of unary 
descriptors: 

. nominal descriptors, where the values 
have names. 

. linear descriptors, where the values 
are numbers. 

The two types of unary descriptor are treated 
in different ways. More restrictions are placed 
on linear descriptors since it is assumed that 
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they are ordered in a meaningful way, and that 
integer values differing by a small amount give 
rise to similar properties and can be grouped 
together. 

Nominal descriptors have discrete names with 
no ordering implied on them. For example, it is 
not meaningful to describe a shape as halfway 
between a circle and a square, although as will be 
shown later, it is possible to express equal 
uncertainty as to whether a primitive’s shape is 
•circle’ or ’square*. 

The only binary descriptor used at present is 
’is spatially related to’. This takes values of, 
for example, ’surrounds’ and is a nominal descrip¬ 
tor. Each value has an inverse, e.g., 

’is-surrounded-by’. 

GENERAL METHOD 

The central idea in the learning algorithm 
described here is one of 'generate and test’. The 
method is an extension of the INDUCE algorithm 
(Michalski 2) where a series of trial descriptions 
is generated using a ’seed’ example, and tested 
against examples and counter-examples. The seed 
example provides the descriptors from which the 
trial descriptions are constructed. After a 
description has been tested, if it is ranked better 
than those before it in the series, according to 
some criterion, it is retained and used to produce 
several more descriptions. The new descriptions 
are produced by specialising the old description. 

If it is no better than those before it, the trial 
description is discarded. 

This guided generation process is equivalent 
to a search. The search is over a space of all 
possible descriptions, consisting of properties of, 
and relations between sub-objects, and this is 
guided by a set of examples of a concept and a set 
of counter-examples. These representatives are 
very important to the working of the algorithm, and 
so a good choice of examples, and more critically 
counter-examples, is essential (Winston 3). 

In simple set terms, if we consider a space of 
all possible objects, then we may represent POS, 
the set of examples, and NEG, the set of counter¬ 
examples, by the sets shown in the figure below: 



A trial description will cover a number of 
possible objects. Three descriptions are 
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represented by the sets in dashed lines, each of 

which has a different property. The set-covers 

all the examples and is called a 'complete des¬ 
cription’, and the set -•-• covers none of the 
counter-examples and is called a ’consistent 
description’. The aim of the learning algorithm 
is to produce a number of complete and consistent 
descriptions, for example, the set ...., which can 
then be used on unknown objects to identify them 
as members of the concept. 

The difference between this type of learning 
and a decision tree is the inductive process, 
whereby descriptions of a class of objects are 
induced from the sets of examples and counter¬ 
examples. The induction in this case is performed 
by generalisation rules which act on a description 
to produce a more general description. In terms 
of the set diagram above, generalisation increases 
the range of objects that the description covers. 

GENERALISATION RULES 

Generalisation is performed only on con¬ 
sistent descriptions generated by the learning 
algorithm (i.e. those descriptions which do not 
cover any of the counter-examples). The reason 
for this is that the aim of the algorithm is to 
produce a series of consistent descriptions that 
are as simple as possible. Hence when a consistent 
(but not complete) description is produced, it is 
generalised, hoping that the new description will 
cover more examples in POS whilst maintaining the 
consistency property. 

There are really only two generalisation rules 
used in the implementation, and they correspond to 
internal disjunction of values of the two types of 
descriptor. They are: 

(i) Adding alternative (or range of 
alternatives). 

(ii) Closing interval. 

(Only the first of these will be described in 
detail.) 

(i) The adding alternative rule works on nominal 
descriptors, using two values of the same descrip¬ 
tor, one of which is in the description already, 
and the other which it is desired to include. 

Values of the descriptor may already be grouped 
together, either by the programmer or by the 
system when it has learnt a rule in the past, to 
form a structure. The existing structure of the 
descriptor is searched to find all existing 
groupings of values that include both of the 
required values. For example, the ’shape* des¬ 
criptor may have the structure shown below: 


Here ’quadrilateral’ is defined as ’square or 
rectangle’ and ’polygon’ as 'quadrilateral or 
triangle' by the user of the program. The struc¬ 
ture is in this case a tree which can be used as a 
convenient means of generalisation, so that square 
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is-a-kind-of quadrilateral is-a-kind-of polygon. 
However, this restricted generalisation has the 
disadvantage that the user has to supply all the 
structure, and that if the combination ’rectangle 
or triangle’ (but not square) appeared repeatedly 
this could not be expressed efficiently. An 
alternative structuring technique adopted by this 
work is a ’group if useful’ technique, where 
initially the user can specify as much or as little 
grouping as he sees fit, and the program will group 
together values if it repeatedly finds such a 
process useful. Thus if no structure were supplied 
to the ’shape* descriptor, but the combinations 
’square or rectangle’ and ’rectangle or triangle’ 
occurred frequently as the algorithm ran, then the 
structure of the descriptor would look like the 
figure below: 

squ_rec rec_tri 

square rectangle^^^^tri angle 

The language being used here is clearly less 
comprehensible than that used in the previous 
structure, but for display purposes the original 
’square or rectangle’, etc. may be used. It has 
the advantages of being easier to manipulate and 1 2 
being able to express a wider variety of combi¬ 
nations of values. 

Each of the groupings containing the two 
values is used to form a generalised description 
which is tested for consistency, starting with the 
largest grouping (corresponding to the most general 
description) and going on to the smallest. When a 
consistent generalised description is found the 
process stops. It is only required to test this 
series of descriptions on the counter-examples to 
establish the consistency property. If no con¬ 
sistent generalised description is found, a group 
consisting of the two values only is created and 
the corresponding description is tested. If this 
fails, the two-value group is deleted and general¬ 
isation of this descriptor is abandoned. 

(ii) The closing interval rule works in much the 
same way on linear descriptors, using intervals 
including the two values rather than on groupings. 

STRUCTURE AND ATTRIBUTES 

The way that the algorithm is implemented is 
to process the binary descriptors first, generating 
structure-only descriptions. Each of these struc¬ 
ture descriptions is then used as a framework in 
which to run the algorithm for the unary descrip¬ 
tors. There are two effects arising from the 
separation of the unary and binary descriptors. 
These are: 

(1) Since structure is treated first, the 
algorithm preferentially generates 
solutions with structural conditions 
rather than attribute conditions. 

(2) Any description obtained with a con¬ 
sistent structural part will be 
consistent. 


PREFERENCE CRITERIA 

The progression of the learning algorithm is 
influenced by two separate preference criteria. 

These are now described, and their effect on the 
type of description generated outlined. 

Preference Criterion (1): 

This is used on every description in order to 
quantify how close to being a solution it is. The 
measure used is simply: 

Number of examples in POS covered by 

description - 

number of counter-examples in NEC 

covered by description. 

Preference Criterion (2): 

The preference criterion used here is a cost 
function which states how successfully a descrip¬ 
tion satisfies certain requirements. It is 
evaluated as a weighted sum of the length, cost, 
and degree of generalisation of a description. 

These weights are user defined according to the 
type of description it is wished to generate (e.g. 
long and specific or short and general). The 
contribution from each characteristic will be 
represented by a number between 0 and 1, defined 
as follows: 

(i) Length of description 
(ii) Cost of generating description 
(iii) Degree of generalisation of 
description 

The features used in this preference criterion 
are not exhaustive; for example, in a more complex 
system, computational simplicity, least possible 
memory used in storage and overall comprehen¬ 
sibility may be important characteristics for a 
description to exhibit. 

UNCERTAINTY 

The main difference between this system and 
those previously implemented is the way the 
quality of data relating to examples is treated. 

For example, a square might be a perfect example 
of a certain concept, but due to the imaging system 
it may not actually have a representation that 
exactly satisfies the axiomatic requirements for a 
square. Nevertheless it may have a certain per¬ 
ceptual similarity to a square, and may well be 
one in the actual scene which has become distorted 
in the imaging system. 

There are several alternatives for represent¬ 
ing uncertainty. In the majority of systems, 
Bayesian Probabilities have been favoured; 
however, Fuzzy Sets and the Shafer-Dempster 
approach (4) have also received attention in recent 
literature. Fuzzy sets (Zadeh 5) were selected to 
represent the uncertainty in the system. 

Each fact is assigned a Fuzzy Truth Value 
(FTV) from 0 to 1. This value represents the 
degree of membership of the fact in the fuzzy set 
of true facts. Hence, a description which matches 
a series of facts from an example or counter-example 
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will have a list of FTVs associated with it. These 
are then combined to give an overall fuzzy truth 
value for the description. If the description is 
made up of n descriptors, and the jth descriptor 
matches a fact in a specific Example with FTVm- 
then the FTV of the entire description (or theJ 
Degree of Fit of a description to an Example, E) 
is defined as: 

Yj>05 Yj<0-5 (i) 

The function F(E) has the following properties: 

(i) Simple polynomial form. 

(ii) Sensitive to all truth values (unlike 
MAX or MIN). 

(iii) Independent of order of truth values. 

(iv) Facts with FTVs<0.5 are given greater 
weight in the calculation than those 
with FTVs>0.5 . 

(v) F(E)<(|ij for n=l, 0.5<^j<l . 

EFFECTS OF THE INTRODUCTION OF UNCERTAINTY 

The definitions of Consistency and Complete¬ 
ness now become dependent on the degree of fit. 

A consistency threshold Tconsistent is set such 
that a description will not be consistent if 
F(CE)> Tconsistent for any counter-example CE. A 
completeness threshold Tcomplete is also set. If 
F(E)>Tcomplete for an example E then that example 
is defined to be covered by that description. 

The introduction of uncertainty into the 
definitions of Consistency and Completeness affects 
the evaluation of the Preference Criteria. With 
Preference Criterion (1) the definition is unchanged 
except that the number of examples in POS covered 
by the description will be those examples with 
degree of fit greater than the completeness 
threshold. Similarly, the examples covered in NEC 
covered by a description will be those with degree 
of fit greater than the consistency threshold. 

Preference Criterion (2) is affected by the 
introduction of two new factors: the consistency 
and completeness ratings of a description, defined 
as follows: 

(i) Consistency of description 

If the consistency threshold is exceeded by 
any counter-example then the consistency condition 
is broken and the consistency rating is set to 
zero. If the degree of fit for the ith counter¬ 
example is F(CEi) (i=l..n) then: 

Consistency Rating = 1 - 1^” F(CEi) . (2) 

~nA— Tconsistent 
i 

(N.B. If F(CEi)=0 for all i 
then Consistency Rating=l) 

(ii) Completeness of description 

If the degree of fit for the ith example is 
F(Ei) (i=l..n) then: 
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Completeness Rating = 1 \ F(Ei) (3) 

« z_ 1 

F(Ei)> Tcomplete 

(N.B. If F(Ei)=l for all i 
then Completeness Rating=l) 

Hence, the evaluation of the Preference 
Criterion now becomes a weighted sum of five 
features: length, cost, degree of generalisation, 

consistency, completeness. The introduction of 
Completeness and Consistency ratings has two 
effects in guiding the system. By weighting in 
favour of completeness the system can be biased to 
include all positive examples. By weighting in in 
favour of Consistency the system can be biased 
against including any counter-examples. 

RESULTS 

This section shows the results of running the 
algorithm on a synthetic image, before and after 
adding Gaussian noise to it. The different 
descriptions generated in each case are given: 



Processed Version of Perfect Input Data. 
Rule Generated: 

•There are two objects X and Y such that 
(X surrounds Y and 
Y is a square or a circle)* 



I_ i _l 

Processed Version of Imperfect Input Data. 
Rule Generated: 

’There are three objects X,Y and Z such that 
(X surrounds Y and 
Y is a rectangle) or 
(X is right of Y and 
X is right of Z and 
X is a rectangle)* 
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The result of adding the noise is that the 
surrounded objects in the examples cannot be 
reliably labelled as a square and circle as before. 
The object in the top example is now considered 
more likely to be a rectangle and the surrounded 
object in the bottom example is too degraded to be 
incorporated as part of a rule. This results in 
the second half of the above rule being generated. 

DISCUSSION 

In its present form, the learning system 
described has several problems associated with it, 
due to the incorporation of uncertainty in the 
algorithm. Some of these problems are described 
below. 

Coverage of Seed Example. 

The INDUCE algorithm is guaranteed to give a 
solution and terminate eventually (when working on 
noise-free data), even if the rule obtained is a 
disjunctive list of the examples in POS. (In this 
case, no induction has been performed by the 
system.) The reason why the algorithm terminates 
is because all the descriptions generated using a 
’seed* example are partial descriptions of the 
example and hence cover it. As the algorithm 
builds up longer partial descriptions of the seed 
example, the set of objects covered by the descrip¬ 
tion become smaller, until eventually only the one 
example is covered. 

The use of the degree of fit measure defined 
above means that partial descriptions of the 
example will not necessarily 'cover* (in the fuzzy 
sense) the seed chosen. A consequence of this is 
that the algorithm cannot be guaranteed to give a 
solution, unless some other constraints are placed 
on it. If the situation occurs in which the seed 
example may not be described without a counter¬ 
example also being covered, then to all intents 
and purposes the descriptors chosen do not dis¬ 
criminate between this example and the counter¬ 
example. This may be remedied in'one of two ways: 

(i) Alter the Tcomplete threshold to 

discover whether any setting will give 
discrimination. 

(ii) Use a better set of descriptors. 

Degree of Fit Measure. 

The degree of fit measure as defined in 
Equation 1 has the property that if two facts in an 
example with truth values of 0 and 1 respectively 
were matched to two descriptors making up a 
description, then that example would have a degree 
of fit of .5 to that description, in spite of the 
FTV of 0 which is designed to represent the falsity 
of that fact. This is resolved by making the 
additional assumption that if the degree of member¬ 
ship of a fact is less than a threshold value T 
(e.g. T=0.3),then it is deleted from the database. 
This can prevent the matching of low membership 
facts and cut down the processing done by the 
algorithm. 


Interdependence of Certainty Values. 

This problem is perhaps best illustrated by an 
example. Consider the two primitives in the figure 
below: 



If primitive 2 is a square, then it is not 
touching primitive 1. 

If primitive 2 is a circle, then it is 
touching primitive 1. 

In other words, the certainty of the relation 
*1 is touching 2* is dependent on the interpret¬ 
ation of the shape of primitive 1. It is therefore 
assumed for simplicity that the facts describing 
the examples are independent of each other, to an 
approximation. 

CONCLUSIONS 

In this work, a machine learning scheme for 
computer vision that models the effects of intro¬ 
ducing uncertainty has been implemented. At 
present, this work is at an early stage and has 
only been applied to simple, synthetic image data 
to investigate the changes that occur when 
uncertainty is present. From the results obtained 
to date, it seems that the rules that are learnt 
from perfect data may differ significantly from 
those obtained from the imperfect equivalent. 
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ABSTRACT 

In this paper we describe a method for 
estimating boundaries between perceptually distinct 
regions in an image. The method is a two step 
procedure which first identifies image regions which 
exhibit uniformity. Boundaries between uniform 
regions are approximated by a complete cubic spline 
function and the linear recursive filter is used to 
estimate the values of the function at the knots. 
Boundaries are assumed to be smooth; however, 
abrupt changes in boundary location may 
infrequently occur. In these exceptional cases 
flexibility of boundary approximation is achieved by 
adding additional knots at the positions where abrupt 
changes occur. 


KEYWORDS: segmentation, splines, texture, linear 
recursive filter. 


1. INTRODUCTION 

Determination of boundaries delimiting objects 
and their parts in an image is recognized to be a 
crutial link between an image and its interpretation. 
It is well recognized that boundary detection in 
images wherein meaningful regions exhibit textural 
properties is a difficult task. Regions in such images 
can be identified by region based segmentation 
operators, e.g. [10,11]. However, determination of 
boundaries between texture regions requires further 
processing. Recently, attempts have been made to 
develop estimation theory-based boundary detectors 
[1,4]. 

This paper describes the design, 
implementation and performance of an estimation 
theory-based segmentation operator for noisy images 
which contain regions of uniform intensity as well as 
texture regions. The operator performs segmentation 
at the signal level and it assumes no a priori 
knowledge on images considered. Its essence is 
incorporation of region based segmentation with 
curve fitting to find boundaries between perceptually 
distinct regions in an image. First, dominant regions 
which exhibit uniformity and which are called cue 
regions are identified. Then, boundaries lying 
between cue regions are estimated 
globally.Boundaries are approximated by a complete 
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cubic spline function. The flexibility of boundary 
approximation is achieved by adding additional knots 
at positions where boundary location changes 
abruptly. 

The function of this segmentation operator 
corresponds to early vision in humans and it is 
viewed as a part of a multi-level modular 
segmentation scheme. At the first level of 
segmentation hierarchy this operator is employed to 
perform rough segmentation of an image. The 
obtained result is then used as an input to subsequent 
levels of segmentation which take advantage of 
knowledge on the scene domain and are task 
dependent. Their function is to perform refined 
segmentation, label cue regions and identify smaller 
objects that are of interest. 

The method underlying identification of cue 
regions is described in Section 2. Boundary 
approximation and estimation schemes are the 
subject of Section 3. Results and future work are 
discussed in Section 4. 

2. IDENTIFICATION OF CUE REGIONS 

The first task of the segmentation operator is to 
identify cue regions and intermediate zones where 
boundaries between cue regions lie. Consequently, 
the problem of boundary estimation between regions 
of unknown properties reduces to the problem of 
boundary estimation in the ambiguity zone between 
two regions of known properties. 

Since most of textures appearing in nature are 
viewed as "uniform” only as a whole while locally 
they exhibit various statistical and structural 
irregularities images are subjected to pre-processing 
prior to region identification. The purpose of this 
procedure is to eliminate minor textural detail and 
map texture regions into regions which exhibit 
higher degree of ergodicity Figure 1(b). This task is 
accomplished by using Gaussian filter applied locally 
over a pixel neighborhood and as a result local image 
variances decrease as a function of a. The filter is 
implemented by using method of hierarchical 
discrete correlation [3] which performs filtering in 
stages and approximates Gaussian by weighted sums 
over small neighborhoods. 

The identification of uniform regions involves 
generation of a T image in which each pixel (x,y) is 
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replaced by 


T (x,y) 


_ 1 _ 

(2P + 1K2Q + 1) 


Q P / 

X X “ 1 U + p, y 

q=-Q p=-P 



Generation of a T image is a form of primitive 
segmentation since T(x,y) is constant within a 
homogeneous region and increases at the boundary 
between two homogeneous regions. (Examples of T 
images are shown in Figure 1 (c)). Cue regions and 
ambiguity zones where boundaries between cue 
regions lie can be easily extracted from a T image 
using thresholding techniques. Results obtained by 
using method proposed by Otsu[9] are shown in 
Figure 1 (d). 



Figure 1. Cue region identification: 

(a) original images, (b) filtered images, 
(c) T images, (d) identified ambiguity 
zones. 


In essence, described procedure is a region 
merging method where intensity in the T image is 
used as the merging criterion. However, it does not 
employ region splitting; instead, pixels which can not 
be assigned to cue regions constitute ambiguity zones 
where boundaries between cue region lie. This 
design decision is governed by the fact that this 
operator aimes at estimation of boundaries between 
dominant and thus large regions which exhibit 
statistical homogeneity. Furthermore, reliability of 
statistical measurements for texture regions falls 
with decrease of region size. 

3. BOUNDARY ESTIMATION 

A boundary f(y) (Figure 2) is approximated, as 
in an earlier work [1], by a cubic spline function g(y) 
with (n + 1) knots. The function g(y) is explicitely 
determined by the function values at the knots and 
derivatives at the interval ends, i.e. 


2 K mii fa)f m + A.( Z )f; + Q.(z)f n+1 . 


m = 1 


(1) 


The task of boundary estimator is then to estimate f., 
f’ i = 1,2,..., n + 1, j = l, n + 1. This task is 
accomplished by the linear recursive filter [7] under 
assumption that state vector evolves according to 


X. = X. . -I- w. , 

— k —k- l — k 

w k is the process noise and 


x T = Tf f f f f 1 
x -’n+r r n+l J * 


Vector x is estimated by considering M 
simultaneous measurements in the ambiguity zone 
between two uniform regions. The measurement 
model takes form 

z k = H k x fc + v k ^ 

where H k is the measurement matrix and v k is the 
measurement noise. A measurement performed in 
the ambiguity zone (in pre-processed image between 
regions and R 2 ) is modeled as 

_ p 1 s + ( P -as)p 2 | (3) 

? 

where p t and p 2 are measured properties of regions R, 
and Rn, respectively, q is the measurement noise and 
p and o are as shown in Figure 2. Based on equations 
(1), (2) and (3) the measurement matrix is 



r K i,i 

K 2,l • 

’ K n+l,l 

A i 


Pl-P 2 

H = —- i 

6 

K 1.2 

K 2,2 • 

• K n+1,2 

A 2 

fi 2 


*1.M 

K 2,M • 

r r 

" n + l,M 
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Figure 2. Boundary estimation nomenclature. 


The success of the described scheme depends on 
two major factors: (1) choice of property p to be 
measured and (2) positioning of the knots. Various 
measurement schemes are described in [2]. (Results 
shown in this paper are generated by using mean 
measured along adjacent strips in regions R 1? R 2 and 
ambiguity zone). In this paper we shall consider the 
role of knot positions. 

The approximation to functions by splines is 
well know to depend on number and positions of the 
knots. Smooth curves, such as those considered in [1], 
can be adequately approximated by small number of 
equidistant knots. However, if abrupt changes are 
expected it is desirable to utilize large number of 
knots and it is particularly important that the knots 
be placed at the positions where significant changes 
occur. An approach to obtaining desired flexibility is 
to consider knots to be free parameters. Different 
schemes for handling variable knots have been 
described by Jupp [6] and De Boor [5]. Muth and 
Willsky [8] have designed a variable knot recursive 
scheme to approximate wave forms by splines. This 
method utilizes age-weighted linear recursive filter 
to change the locations of the knots. The scheme is 
designed to work on functions which have a large 
number of jumps. 

In this work the assumption is that boundaries 
are generally smooth and abrupt changes occur 
infrequently. Therefore, it is more computationally 
efficient to achieve the flexibility of boundary 
approximation by knot additions rather than 
variations in the location of a given number of knots. 
The boundary estimation procedure starts with (I +1) 
equidistant knots and the new knot is added to 
segment h. at position r if z r - z^ > A (A is the 
threshold value and z is the measurement taken 
along strip located at r). In this way new knots are 
added at the positions where abrupt changes in the 
first derivative occur or where maxima and minima 
are expected. The reliability of the measurement is 
increased by considering m previous and n new 
measurements around position r. While it is possible 
to consider large number of previous and new 
measurements, we have found m = n = 2 to be 
sufficient. The dimensionality of the vectors 
(equations (1), (2), (3)) is increased by one for each 
additional knot and all vectors are appropriately 
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augmented. The final boundary is approximated by 
N + 1>I + 1 knots. Addition of new knots is 

illustrated in Figure 3, while results obtained are 
shown in Figure 4. 




Figure 3. Addition of new knots (a) initial knots, 
(b) added knots ( ♦ represents added 

knots and*are initial knots). 


4. CONCLUSIONS 

Performance of the described boundary 
estimator has been evaluated on noisy images 
containing regions of uniform intensity as well as 
images containing texture. Generally, the results 
obtained are in good agreement with human 
perception. Addition of new knots, when required, 
allows good boundary approximation in cases where 
boundary location changes abruptly without adding 
significant computational burden. Development of 
the method is subject of further research. At present 
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we are investigating usage of multiple properties in 
the measurement scheme, i.e. both statistical and 
texture properties, to increase reliability of the 
measurements. 



[10] Pavlidis, T., "Structural Pattern Recognition,” 
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Processing, Vol. 5, pp. 382-399,1976. 


Figure 4. Examples of obtained boundaries in noisy 
images. 
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ABSTRACT 

Areas of spectral reflectance, or highlights, can be 
analyzed for a wide range of information or clues that 
they give us about a scene. This paper presents a local 
algorithm for analyzing a moderately unconstrained color 
image to determine the areas of spectral reflectance. 
The algorithm is based on the separability of the diffuse 
and spectral reflection components by differential 
methods. 

The location of specular reflectances are marked by 
finding zero-crossings in concave down regions for two- 
dimensional arrays of intensities representing the color 
image. These zero-crossings correspond to the centers 
of the highlight regions. The highlight centers are then 
expanded to highlight regions by region growing in a di¬ 
rection orthogonal to the local orientation of the high¬ 
light. Thus, at the conclusion of the algorithm, the infor¬ 
mation known about each highlight includes location, 
size and direction. 

INTRODUCTION 

Computer Vision is often perceived as something that 
should be trivial. The reason for this perception is that 
we are ourselves so good at vision, we take the whole 
process of vision for granted. In fact, the interpretation 
of our three dimensional world, as portrayed in a two di¬ 
mensional array of intensities, is anything but trivial. 
While humans bring a vast amount of ‘intrinsic’ informa¬ 
tion to bear on the problem of image analysis, the com¬ 
puter does not have the capacity, at the current time, to 
perform the same feat. Therefore, to permit any useful 
analysis of an image whatsoever, we tend to limit, or 
constrain our image world such that analysis becomes 
feasible with respect to the limited amount of knowledge 
we can impart to the computer. 

A particularly useful and efficient task that we practice 
every day, however unwittingly, is that of distinguishing 
between objects made from different materials. An im¬ 
portant prerequisite for such perception is the ability to 
discern the quality of an object’s appearance. Various 
qualities of appearance are apparent in the world around 
us, such as texture, color, shine, luster, etc. all of which 
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give us important clues as to an object’s composition. 
This paper will concern itself with the quality of surface 
gloss. 

Glossiness, in general, is correlated with specular reflec¬ 
tance [Beck 72], By looking at picture 1 we can easily 
determine which objects are shiny by the presence of 
areas of spectral reflectance. 

Surface sheen, shine, gleam, etc. (see [Wyszecki 75] for 
a discussion of these terms) is a very important aspect 
in material discrimination. We regard metals as having 
a shiny appearance, whereas plastics, while they may 
have as smooth a finish, appear somewhat dull in com¬ 
parison. Other surfaces may be altogether matte. These 
differences are caused by the presence, or absence, of 
local mirror-like or specular regions of reflected light, 
henceforth called highlights. If we can detect these high¬ 
lights within an image, we can glean information that will 
help us to identify materials. 

Some other consequences of finding highlights are that; 
(i) they would aid in constraining the size, color, and lo¬ 
cation of a light source; (ii) they would simplify object 
recognition or matching by identifying the regions so that 
some of the effects of the illumination could be ‘factored 
out’ and (iii) they also would enable constraints to be 
placed on object size and location [Thrift 82]. Perhaps a 
more basic or fundamental reason for wanting to locate 
highlights is that computer vision is concerned with mod¬ 
eling human vision, of which an inherent feature is the 
ability to locate highlights. 

The detection of highlight regions proceeds by examin¬ 
ing the stimuli that creates the sensation of a highlight. 
Horn’s [Horn 75] model of surface reflectivity describes 
the two basic reflection components from a surface, 
specular and diffuse, as being separate quantities. For- 
bus [Forbus 77] used this information to generate a 
series of one dimensional profiles of intensity for curved 
surfaces, to see what parameters are relevant to the 
perception of highlights in archomatic images. Forbus 
noted that both the specular and diffuse reflection com¬ 
ponents must be present to create the sensation of a 
high light. Using this information, in conjunction with the 
expected sinusoidal shape of the specular reflection 
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component as given in Horn’s equation, we can detect 
highlights. 

1. DIFFERENTIAL OPERATORS 

The sinusoidal shape of the specular component of re¬ 
flection can be used to locate highlights by looking for 
zero-crossings in the first differential of the intensity 
image (see figure 1-1). Since zero-crossings in the first 
differential can correspond to either maxima or minima 
in the original image, care needs to be taken that we ac¬ 
cept only those zero-crossings for which we have a 
maxima in the original image. Consequently, a concavity 
check is made by taking the second differential of the in¬ 
tensity image to make sure we have a peak of intensity 
and not a trough. 

By using the first derivative of the Gaussian as our 
operator, we incorporate both a smoothing and differen¬ 
tial operator into one step. The form of our operator is a 
two dimensional mask of values, calculated to simulate 
the derivative of the Gaussian, which can then be con¬ 
volved with the intensity image. Since our convolution 
operator is now a function of cr, we can vary the sen¬ 
sitivity by using larger or small values for a. 

While we may choose to apply our differential operator 
using a single value for o, that would be inappropriate 
considering the wide range of highlight sizes that may 
occur in an image. Highlights with a wide range of sizes 
can only be located by using multiple values for a and 
then ORing all the results together. To ensure that we 
define zero-crossings not caused by singularities due to 
the choice of a, the algorithm incorporates a phase 
which finds the zero-crossings for two values of a and 
then ANDs those images together. Taking two values of 
a relatively close together ensures that when we AND 
the results we do not eliminate ‘true’ highlights due to 
the scale of our operator. We can then OR a few such 
a pairings to cover the gamut of highlight sizes. 

However, we must remember that the differential mask 
is directional (non-isotropic) and must be applied to the 
image oriented at various angles of 0. It turns out that 
the convolution of the differential mask with the image 
need only be done twice, for two orthogonal angles, 
since the other angular orientations can be derived from 
those results. Complete zero-crossing angular orienta¬ 
tion information can be ascertained from four angles of 
0, each forty-five degrees apart. This means that two 
subsequent calculations need to be performed on the re¬ 
sults of the orthogonal angle convolutions. Once we 
have the zero-crossing information for our differential 
masks the results can be combined with a concavity 
check to ensure we keep only those zero-crossings for 
peaks of intensity. 

Concavity is determined by taking the second derivative 
with respect to a curve and looking at the sign of the re¬ 
sultant. In our case, this means convolving the original 
image with a two dimensional, second differential, mask. 
The mask is formed in the same manner as the first dif¬ 
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ferential directional mask, that is the differentiation is 
with respect to the Gaussian, but with the exception of 
using the Laplacian rather than the directional derivative 
so that the convolution need only be done once 
[Marr 75]. The result of the convolution is a two dimen¬ 
sional array composed of positive and negative regions 
demarcating concave up and concave down regions re¬ 
spectively. Only those zero-crossings within negative 
valued regions are accepted for further analysis. The re¬ 
sult is an image containing zero-crossings for the origi¬ 
nal monochromatic image. There are still two other 
monochromatic images left from the original RGB im¬ 
ages to process. Color is used to corroborate the data 
[Kanade 81] so that we have a ‘true’ highlight identifica¬ 
tion system. The complete zero-crossing information for 
the RGB image is shown in picture 2. Since zero-cross¬ 
ings chains are only one pixel wide, it is necessary to 
incorporate a region growing phase into the algorithm to 
locate the whole highlight. 

2. REGION GROWING 

The highlight pixels that surround the previously located 
highlight center chain can be found by growing out¬ 
wards, from that central chain, in a locally orthogonal di¬ 
rection. To ascertain which direction is orthogonal the 
local direction of the highlight center chain is found over 
a three-by-three mask. The growing proceeds while the 
results of convolving a growing one dimensional ortho¬ 
gonal mask, with the original image, are increasing. 

By convolving the power-of-two mask in figure 2-1 with 
the binary valued highlight chain image (1 for highlight 
pixels and 0 otherwise) and examining the resulting 
numeric value, we can determine the local direction. 
Using the eight-connected neighbor model there are four 
directions; north-south, east-west, southwest-northeast 
and northwest-southeast. For example if the convolution 
result is 17 or 68, the highlight center chain pixel is tag¬ 
ged with a east-west or north-south flag respectively. 
Similarly other values demarcate various compass direc¬ 
tions. From the local direction labelling we can also label 
each of the pixels with a corresponding orthogonal direc¬ 
tion. Choosing the local direction can be impossible for 
some patterns that can arise in a three-by-three area, so 
these pixels receive special treatment. 

These pixels are labeled ‘blob’ pixels since they are 
without definite direction. To enable processing, they are 
marked with a flag that states every direction is ortho¬ 
gonal, and thus they are processed for each of the four 
directions. Once all the highlight center chain pixels 
have been tagged for direction, the orthogonal direction 
highlight growing can proceed. 

Orthogonal to the center chain pixels which identifies the 
peak of the highlight the highlight intensity values de¬ 
crease until reaching the value of the diffuse intensity for 
that surface. Therefore, a one dimensional mask, 
oriented in the orthogonal direction, is set up to calculate 
the difference between a highlight center chain pixel and 
points on each side of it (each side is treated separately 
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due to the non-symmetrical shape of most highlights). At 
each iteration of the algorithm the size of the mask is 
increased by one as the edge of the mask is moved 
‘outwards’ by one pixel. 

The algorithm keeps iterating for each highlight pixel 
point as long as the calculated value is increasing. 
When the algorithm terminates, the current value for the 
masksize determines the diameter of the highlight for 
the current side it was working on. Since blob pixels are 
processed for each direction, the current value of the 
mask size is stored so that it can be compared to the 
subsequent mask size values. The final mask size cho¬ 
sen for blob pixels will be the minimum of the directional 
mask sizes. The blob pixel will then be marked with one 
of the four directions that corresponds to the minimum 
sized mask (see picture 3 for a scene with the highlight 
diameters used to generate disks around the highlight 
center pixels). We now have each highlight chain pixel 
tagged with a highlight diameter to each side of it and a 
direction. The highlight is thus completely determined. 

3. CONCLUSION 

An algorithm is described in this paper which locally pro¬ 
cesses scenes of various objects to determine areas of 
spectral reflectance or highlights. The algorithm is based 
on the separability of the spectral from the diffuse reflec¬ 
tances by differential methods. Once the highlight center 
chains are detected by the algorithm, they are expanded 
to highlight regions by region growing in a direction or¬ 
thogonal to the local orientation of the highlight. At the 
conclusion of the algorithm the information known about 
each highlight includes location, size and direction. Thus 
the algorithm provides information that a Computer Vis¬ 
ion system must make use of when analyzing, or under¬ 
standing, a scene. In addition, the information this al¬ 
gorithm provides can be used as a preprocessor to 
image processing algorithms that rely on predetermined 
areas of spectral reflectance. It can also be used to 
identify areas of spectral reflectance so that they can be 
removed from the scene. This eliminates illumination 
peculiarities which might confuse later/other algorithms 
that do pattern matching. 

The fact that the algorithm is successful using moder¬ 
ately unconstrained images is important since it de¬ 
creases the gap between the world that the computer 
can now understand and the extremely complicated one 
in which we live. 
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Figure 1-1: Highlight type curve and its derivative 
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Figure 2-1: Power-of-two mask 


Graphics Interface ’86 


Vision Interface ’86 




373 





[Babu 85] 




[Barrow 81] 


* 

a 

[Beck 72] 




[Bryant 83a] 


Picture 1: Image containing objects with highlights 
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RESUME 

Un outil informatique est propose dans le but 
de permettre I’etude interactive des caracte- 
ristiques spatio-tempore11es des signatures 
manuscrites. L’outil perroet de faire le pont entre 
l’aspect visuel d’une signature et les caracteris- 
tiques dynamiques reliees a son execution. Des 
commandes d’edition graphiques et de traitement 
numerique sont disponibles a l’usager pour 
re spectiveraent manipuler et modifier a l’ecran les 
differentes representations reliees a une signature 
donnee. 

ABSTRACT 

A software tool is proposed for interactive 
study of spatio-temporal characteristics related to 
handwritten signatures. The tool fills the gap 
between visual and dynamic aspect of signature with 
specific graphic editing commands that can be used 
to manipulate on the screen the various 
representations of features related to a given 
signatures. Also, useful data processing commands 
allow modification of the content of these 
representations. 

I- INTRODUCTION 

Handwriting is a rather complex mechanism which 
results in the generation of line images. These 
images can be analysed by different recognition 
techniques based either on the visual output of the 
process or on the dynamic information acquired by 
specific set up during the process itself. Among 
the different class of problems dealing with 
handwriting recognition, signature verification has 
been given a growing attention in the past ten 
years in the field of computer security. Indeed, 
with the increase in the number of electronic funds 
transfers and any other computer access, the need 
for an Automatic Personal Identification (API) 
system has become a major priority. 

Signature verification techniques offer 
different ^ advantages over other identification 
techniques (passwords, PIN’s, magnetic card, 
finger print, voice,...). It is an accepted and 
easily tested method. Signature cannot be lost or 
stolen and it can hardly be imitated dynamically. 
In the past fifteen years, intensive research has 
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been made3,4,5,6,7,8 to implant an API systems 
based on signature. But none of the systems already 
proposed in the literature has put the final point 
on the subject. Indeed, handwritten signature is a 
complex task requiring a high muscular skillfulness 
and we believe that further fundamental research is 
needed on the handwritten phenomenon to improve the 
performances of the systems. 

Different tools and methods exist that help 
research in handwriting. In psychology , for 
example, measurement of reaction time or movement 
time are often used to verify presumptions about 
specific handwritten task (e.g. identification of a 
movement unit in handwriting; the stroke). In this 
paper, we present an interactive software tool 
based on the possibility to do time coupling 
between visual information (that is the signature 
itself on the paper) and any type of dynamic 
information based on data sampled during the 
execution of the handwritten task. 

In this contribution, we recall in section II 
the two class of features involved in signature 
verification (visual and dynamic) and point out 
their complementarities. We present in section III 
an overview of the features (technical and 
functional) implanted in the software tool. In 
section IV, we give a typical application showing 
the utility of the tool in interactive analysis of 
handwritten task, specifically signature. 

II- VISUAL VERSUS DYNAMIC INFORMATION 

Two major class of features dealing with 
signature can be used as input for an API system: 

the visual information 

(that is the final result on the paper). 

the dynamic information 

(that is the sequence in time of any measurable 

(but meaningfu1) characteristic). 

Optical analysis of signature is a useful 
technique in off-line verification application like 
document expertise. But, sometimes, in order to 
pronounce a correct verdict about the authenticity 
of a signature, an expert needs to gather dynamic 
features from the static representation of a 
signature by examining it under microscope (the 
aspect of the paper fiber where the pen had passed, 
the variation in the thickness of the line,...). 
However these techniques cannot be automated (at 
least in a near future) and are thus unusable for 
API applica tions. 
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On the other hand, dynamic signals can be 
processed and analysed by different techniques, 
namely the usual signal processing approach 

(filtering, correlation, spectral analysis, time 
series, etc). But these methods rarely take into 
account the visual origin of these specific 

signals and questions like: 

"Which parts of a signature was traced more 
rapidly than a speed threshold?” 

"Are the amplitude variations of the signal 
more determining in the shaping of a given 
letter than are frequency variations?” 

”Are rapid movements more accurate and easily 
repeated than slower ones?” 


remain unanswered because only one aspect of the 
available information is usually analysed. We 
believe that signatures must be analysed by 
coupling the visual and dynamic information 
together. This requires that position and other 
dynamic information must be sampled simultaneously 
during the execution of a signature. 

The source of information related to signature 
could be of various kind: 


- Those issued directly from transducers at the 
execution time. The transducers could be in the 
writing ^ gen (like strain gages , accelero¬ 
meters ’ ’ * etc.), or under the writing surface 
(digitizer or digitizer with analog computation 1 ) 


- Those obtained indirectly after computation 
from the sampled data (first and second time 
derivative of position, radius of curvature, 
instantaneous frequency of the signal, or any other 
interesting features). 


But the main requirement to make time coupling 
is that all sources of dynamic information must be 
sampled simultaneously by an adequate set-up. 




Ill- THE SOFTWARE TOOL 

To be able to illustrate the idea behind the 
expression "time coupling”, an handwritten 
signature has been sampled (at 200 Hz) by a 
digitizer (model MM960 Summagraphic Inc.) One can 
redraws the signature on a graphic screen by 
tracing the X(t) sequence vs the Y(t) sequence as 
in figure.1 where the bottom signal is X(t) while 
the one above is Y(t). 

The effective coupling of information can be 
made internally by software because we know "when” 
each upstroke and downstroke have been made. To see 
a desired coupling with the software tool, the user 
only has to localize a moving cross on the screen 
with the cursor arrows on the keyboard (or with a 
mouse) on a point of interest (e.g. a glitch on the 
X(t) signal or a particular stroke on the 
signature) and depress the space bar on the 
keyboard (or a button on the mouse). The software 
automatically finds the-point of the graph that is 
the nearest to the cross, and indicates with 
numbered vertical line(s) all the couplings in time 
occurring on the graphs that belong to the same 
signature. 


For example, the underlying "timing” of the 
signature in figure.1 can be retrieved by coupling 
ten consecutive samples of the position signal 
(e.g. one at each 100 samples representing one 
tenth of the total' duration of the signature) to 
their effective positions in 2D (figure.2). The 
numbered vertical lines indicate the ten couplings. 

Several functions have been implanted jointly 
to the coupling function but before saying more 
about that, the following describes the technical 
features of the software. 

Technical features 

The software is written in FORTRAN 66 (about 
6000 statements or about 230 Kbytes of compiled 
code running on a mainframe IBM 4381). It uses the 
T.C.S. Library of subroutines and so it must be run 
from a TEKTRONIX 4010/4014 terminal (or a Tektronix 
emulator). At execution? an additional 170 kbytes 
is required for constants, and workspace. 

At the data level, the working area consists in 
two regions (for eventually two signatures), each 
divided into seven channels. One channel can hold 
up to 2048 data (e.g. a ten seconds signature if 
sampled at 200 Hz). At start up, the program reads 
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a data file into the system and puts it in one 
region of the working area. If the file contains 
only three non-empty channels (e.g. X position, Y 
position, and pressure), four channels are then 
left empty and could be filled with signals that 
could be modified by one of the 19 data processing 
commands (see functional features). 

At the graphical level, the software can trace 
on the same screen: 

-up to 10 figures associated to the first file 
and/or 

-up to 10 figures associated to the second file 
and/or 

-up to 10 figures associated to the two files 
together in or-der to make comparison. 

(e.g. correlation between files). 

Each of these graphics can be modified in 
different ways by the 17 graphics commands as it 
is de s c ribed next. 

Functional features 

The implanted functions are divided as 
foilows: 

- 19 data processing commands 

- 17 graphic editing commands 

As we have said, each channel of the working 
area can be modified by the data processing 
commands. These commands are menu driven. For 
example, one can filter, derive, interpolate or 
even make an FFT of a signals, or else, makes 
operations between signals (add, multiply, 
correlate...). Also one can add a customized 
application to the software. It is easily 
expandab 1e. 

Also, each figure drawn from these channels can 
be interactively edited via the graphic editing 
commands. These commands are called with a one 
character mnemonic and use a moving cross to 
manipulate the figures on the screen. Most of the 
graphics commands concern manipulations of two 
kinds of windows: a "virtual” window which 

determines the data to be displayed, and a 

display” window which determines where, on the 
screen, the selected data will be plotted. Commands 
are available to modify the number, the disposition 
and the content of the windows to be drawm on the 
screen. It is also possible to make measurement by 
superimposing a user defined reference grid on a 
window or an other window in order to make 
qualitative comparison. 

Obviously, we cannot discussed here the details 
of each of the implanted functions but we can give 
an application example using some commands in order 
to gained an insight in the way the program 
progress throughout the commands. 

IV- APPLICATION 

Suppose we have a signature sampled with a 
given digitizer. We wa n t to k n ow wh ere, on the 
signature, the line has been traced at a speed 
higher than an arbitrary threshold value. 



Figure 3 


To answer to this question, the following 

procedure will be used: 

1. Read the file where the X and Y coordinates of 
the signature are stored. 

2. Make a copy of the position signals (X(t),Y(t)) 
in two empty channels of the working area. 

3. Display the signature to see if it is necessary 
to filter the signals (figure.3). 

4. Filter if necessary (yes in our case). 

N.B.: Two kinds of filter are available in the 
program: ^ 

- frequency sampling FIR filter 

- moving weighted average filter. 

5. Derive one pair of position signals to obtain 
the speed relative to each direction (X’(t) and 
Y(t). 

N.B.: We use finite central difference calculus for 

1 2 

fifth orders polynomials 



Figure 4 


Graphics interface ’86 Vision Interface ’88 




378 


6. Transform the (X ’ ( t) , Y ’ ( t) ) signals into /'V ( t) / , 
the absolute speed value. 

7. Display the signature and the /V(t)/ signal 
(figure.4). 

8. Choose the threshold speed value (below which 
the signature will be traced with dotted line) 
with the specific graphic editing command. 

9. Display again the signature and /V(t)/(figure.5) 



Figure 5 


If we now want to ”zoom” on a particular 
section of the whole signature, a command exist to 
cut the desired section and redraw it magnified 
(e.g. the "an” of ”Jean"). Figure.6 shows the 
result. 

As we have already said, other useful 
functions are available to help the user in his 
study of a handwritten task and figure.7 shows some 
of them. A more complete subset of functions will 
be shown at the oral presentation. 



V- CONCLUSIONS 

We have developed a tool of practical interest 
to study signatures (or handwriting in general) in 
a pleasant environment. It runs on a mainframe 
allowing computing power and a large library of 
already developed programs of mathematical 
transformations and digital signal processing. The 
program has provision for additional functions. The 
features extraction process can now be made 
interactively with this software tool. We 
presently use it to develop handwriting formation 
models (and to verify model already proposed in 
literature^) in order to better understand the 
nature of handwriting. 
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Abstract 

We present an application of computer vision in the 
field of chemical engineering, the processing of images of 
glass fibers. Fibers are used as reinforcement for several 
polymer products to enhance their mechanical and ther¬ 
mal performance. In order to evaluate the properties of 
the polymer products however, a quantitative measure of 
fiber length and orientation is required. 

The algorithm presented here locates the fibers present 
in a given image and thus enables their quantification in 
length and orientation. This algorithm operates in two 
steps. In the first step, feature points in the image are 
extracted in order to enable the generation of hypothe¬ 
ses as to the possible presence of fibers. In the second, 
the generated hypotheses are verified and the hypotheses 
that yield the highest confidence are retained. 


1. Introduction 

We present an application of computer vision in the 
field of chemical engineering, the processing of images of 
glass fibers. Fibers are used as reinforcement for several 
polymer products to enhance their mechanical proper¬ 
ties and thermal performance. However, the process in¬ 
duced orientation distribution of fibers is anisotropic, and 
this results in direction dependent mechanical properties. 
Thus, a quantitative measure of fiber length and orienta¬ 
tion is required for predicting the mechanical properties 
of the product. 

In a typical image of glass fibers, the fibers have a 
high intensity, as opposed to a noisy background which 
has. on average, a low intensity. In addition, the fibers ap¬ 
pear as straight line segments of differing lengths. Thus, 
looking for fibers, we actually focus our attention on de¬ 
tecting high intensity straight line segments on a low 
intensity and noisy background. A number of techniques 
have been developed in order to solve this problem. Re¬ 
laxation labelling is one of them[l]: using this technique, 
an initial set of orientations is assigned to the image 
points, and these orientations are iteratively altered until 
the orientation at each point becomes the one dictated 


by its neighborhood. The main drawback of relaxation 
labelling and similar techniques lies in their excessively 
large computational complexity, which renders them im¬ 
practical for most applications. Another technique that 
is in use for detecting line segments is the Hough trans- 
form[2]. The Hough transform is a mapping from image 
space to parameter space (usually taken as distance and 
orientation) in which colinear points in the image appear 
as clusters of points in the parameter space. However, 
the Hough transform does not distinguish between con¬ 
nected and non-connected points. This results not only 
in interference between points residing on different seg¬ 
ments. but also in ambiguous interpretations of the clus¬ 
ters in parameter space. 

We take a different approach for solving the fiber de¬ 
tection problem. For this, we adopt the well known hy¬ 
pothesis prediction/verification paradigm [3].This algo¬ 
rithm consists of two major parts: a predictor, which pre¬ 
dicts possible fiber locations, and a verifier, which verifies 
the predictions and discards those which do not satisfy all 
of the constraints. Bearing in mind that template match¬ 
ing is itself a form of hypothesis prediction/verification 
(although of an exhaustive nature), it becomes clear that 
the main requirement that is imposed on the predictor is 
that it should significantly limit the scope of its predic¬ 
tions. In other words, the image-based features that are 
used by the predictor in formulating hypotheses should 
be closely correlated with the hypotheses themselves. In 
what follows, the features that are used in formulating 
hypotheses are described, and the cost function which 
allows the verifier to prune hypotheses is presented. 

The complete algorithm is implemented in the C pro¬ 
gramming language on a VAX 11-750 running the UNIX 
operating system. 

2. Predicting Fiber Locations 

Consider a slide of glass fibers. The slide is illumi¬ 
nated by direct lighting and the reflected light pattern is 
viewed with a camera mounted on a microscope. This 
yields an image in which the regions occupied by the 
fibers have a large gray value, as opposed to the back¬ 
ground gray value, which occupies, on average, the low 
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intensity range. Thus, in such images, the histogram is 
bimodal. and it is possible to differentiate between the 
fibers and the background by simple intensity threshold¬ 
ing. The threshold is selected using the p-tile method, 
whereby a certain percentage of the pixels are assigned to 
the background, while the remaining pixels are assigned 
high intensities and are identified as forming the regions 
occupied by the fibers. 

In our experiments, the threshold is chosen in such 
a way as to assign 85 percent of the pixels to the back¬ 
ground. and only 15 percent to the fibers. This spec¬ 
ification is directly related to the density of the fibers. 
Since this density is relatively constant, the percentage 
value specification is invariant with respect to the type 
of images that is being dealt with. As a result of inten¬ 
sity thresholding, the image is partitioned into a set of 
regions, all of which (except for the background) have 
resulted either from the glass fibers or from noise. 

In order to partially eliminate noise, regions with very 
small areas are discarded. In order to detect the fibers, 
even those inside large clumps, and then measure their 
length, the geometrical properties of the remaining re¬ 
gions have to be analyzed. These geometrical properties 
are captured by first thinning the regions. The thinning 
algorithm used is Tsuruoka’s sequential algorithm for 4- 
connected thinning of binary images[4]. The advantage 
of thinning lies in the fact that it reduces the amount of 
data to be processed at later stages, while preserving its 
integrity as far as the number of fibers, as well as their 
individual length and orientation are concerned. In ad¬ 
dition. thinned regions allow us to formulate hypotheses 
as to the placement of the fibers. The observation un¬ 
derlying this statement is that thinning imposes a struc¬ 
ture on the local arrangement of pixels inside a region. 
Since our objective is to detect glass fibers, which are 
straight line segments, knowledge of the endpoints of a 
glass fiber is sufficient to determine its position, length 
and orientation. Thus, in order to produce hypotheses 
as to the placement of the fibers, we will, in the first 
step, detect those pixels which could be located either at 
the extremities of fibers, or at their intersection. Knowl¬ 
edge of these endpoints and intersection points will allow 
us to predict all possible configurations of fiber arrange¬ 
ments. and then retain only the one configuration that 
best matches the fiber arrangement in the original im¬ 
age. 

In order for a pixel P to be an end-point, it should 
possess the following neighborhood configurations (mod¬ 
ulo reflections and rotations): 



where the x sign denotes a pixel which belongs to the 
same region as P. Also, in order for P to be an intersection- 
point. it has to have the following neighborhood config¬ 
urations (modulo reflections and rotations): 


* r ' *) (x' x ) 

Let R be a thinned region, defined by its constituent 
image points. Let S be the set of end-points and intersection- 
points which also belong to R. Since a line segment can 
be defined by a pair of points, generating hypotheses as 
to the presence of fibers is equivalent to mapping the set 
S into the Cartesian product of S with itself, i.e. S x S. 
The elements of S x S are of the form (P^Pj) where 
P- and Pj are end points or intersection points of the 
thinned region R. Note that the hypotheses which cor¬ 
respond to lines of length 0, i.e. to elements in S x S 
which are of the form (P^P^) are not considered. In ad¬ 
dition. since for our purposes the line segment predicted 
by the pair (P l ,P J ) is the same as that predicted by the 
pair ( Pj, P{). we need consider, in total, less than half of 
the generated hypotheses. Once all possible hypotheses 
have been generated, they are forwarded to the hypothe¬ 
sis verifier which retains only those which could be valid, 
according to some specific criterion. 

3. Verifying Hypotheses 

Let H be the set of all possible hypotheses. We have 
H = S x S. As was mentioned previously, we consider 
only those hypotheses which are neither trivial nor redun¬ 
dant. Evaluating the generated hypotheses and retaining 
those which could be true is equivalent to mapping H into 
a set of hypotheses H*. where all hypotheses belonging 
to H* are true according to some specific criterion. This 
criterion should reflect as much as possible the informa¬ 
tion in the original image l(x,y). as far as the placement 
of fibers is concerned. One criterion that could be used in 
evaluating the hypothesis h £ H would be the following: 

F (M = ^ 'foy) 

(x,y)eL h 

where L^ is the line segment predicted by hypothesis h. 
and N is the number of points on L^. Since the back¬ 
ground has. on average, a lower intensity than the re¬ 
gions occupied by the glass fibers, the above summation 
will yield a large value whenever the line segment corre¬ 
sponding to a predicted hypothesis actually overlaps with 
a glass fiber, and a much lower value if the overlap is only 
partial. We would thus have: 

h 6 H* = F(h) > T 

where T is some threshold imposed on the criterion F. 
This approach could work in some limited cases. Its main 
drawback is its dependence on variations in the input im¬ 
age. namely, in the gray levels of the regions occupied 
by the fibers. In order to remove this dependence, the 
predicted hypotheses are not matched with the original 
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image, but rather with the thresholded image, where re¬ 
gions occupied by the fibers have a gray value of 1 and 
the background has a gray value of 0. Thus, our new 
criterion is: 

F ( h ) = ^ E 

[x y y)eL h 

where l-p is the thresholded image. The above criterion 
returns a value between 0 and 1. Again, as before, a 
hypothesis h is retained if and only if its associated cri¬ 
terion function F(h) has a value larger than a threshold 
T. The closer this threshold is to 1. the better the match 
is between the selected hypotheses and the actual fiber 
placements. Conversely, if T is chosen close to 0. a large 
number of false hypotheses will be retained. Thus, the 
tradeoff on the choice of the threshold T is the traditional 
tradeoff between accepting a hypothesis given that it is 
false, and rejecting it given that it is true. Note that we 
could also have matched the predicted hypotheses with 
the thinned image. The main drawback with doing this 
however, lies in the lack of robustness of such a match 
to slight variations in line orientations (which would be 
due to noise), bearing in mind that line segments in the 
thinned image are only one pixel wide, while they are 
many pixels wide in the thresholded image. 

Once the set H* of fiber locations has been gener¬ 
ated. additional processing is necessary in order to group 
together line segments which have been artificially split. 
This happens whenever many fibers intersect. The result 
of each intersection is an additional intersection point, 
which yields additional hypotheses. In order to overcome 
this, the cosine of the angle between any two segments 
in H* which have a point in common is used as a similar¬ 
ity measure to decide whether or not two segments are 
actually part of a common larger segment. 

A summary of the complete algorithm is shown in 
figure 1. 

4. Experimental Results 

Figure 2 shows a typical image of a slide of glass 
fibers. Note that in addition to the clusters formed by 
the fibers, complications are introduced by the presence 
of background noise. 

Figure 3 shows lj. the result of thresholding. 

Most of the fibers have been retained during the 
thresholding process owing to their large gray value; how¬ 
ever a number of regions which are due to noise have also 
been detected. These regions, which are small in size, 
are eliminated in a subsequent step. Figure 4 shows the 
result of thinning the thresholded image. 

As expected, the amount of data to be processed 
has been drastically reduced, while shape information re¬ 
lating to fiber placement and orientation has been pre¬ 
served. Figure 5 shows the result of the hypothesis pre¬ 
diction/verification process. The threshold T that is im¬ 
posed on the quality of the match is chosen equal to 0.9. 


IMAGE 



Figure 1 Flowchart of the algorithm 



Figure 2 Original image 

In other words, in order for a certain hypothesis to be 
verified as true, there must be at least 90 percent of the 
line predicted by that same hypothesis overlapping the 
region occupied by the fibers in the thresholded image. 

As can be seen, a large number (80 to 85 percent) of 
fibers have been properly detected. Fibers which have not 
been detected are mostly associated with noisy regions, 
or even regions with an excessively large number of inter¬ 
secting fibers, where the shape information is distorted 
during the thinning process. This distortion results in a 
series of false hypotheses, which are then rejected during 
the hypothesis verification process, owing to the rather 
poor match between the predicted line segments and the 
actual fiber locations. Conversely, a number of line seg- 
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Figure 4 Thinned image 



Figure 5 Detected fibers 

ments which are not associated with fibers but rather 
with noisy clusters are also detected. 

Figure 6 shows another image of glass fibers. 

In figure 7. the result of the processing is illustrated. 
Again, the same proportion of glass fibers has been de¬ 
tected. 


Figure 7 Detected fibers 

The procedure currently distinguishes between 75 and 
95 percent of fibers present in a given image, depending 
mainly on its complexity. The use of computer vision 
for glass fiber detection has yielded a dramatic increase 
in the analysis speed of the cross-sectioned experimental 
samples. 

5. Conclusion 

In this paper, an efficient algorithm for detecting glass 
fibers in polymer products was presented. The algorithm 
consisted of two parts: a predictor, which predicted pos¬ 
sible fiber locations, and a verifier, which matched the 
predictions against information derived from the original 
image. The results of applying this algorithm to images 
of glass fibers were shown to be successful. 

Although this algorithm currently distinguishes be¬ 
tween 75 and 95 percent of the fibers present in a given 
image, possible improvements can be made by refining 
the segmentation procedure used in this algorithm; this 
will lead to a better discrimination between fibers and 
background, and will hence reduce the loss of shape in¬ 
formation. 
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ABSTRACT 1. Introduction 

The retina is an approximately spherical structure. In The topography of the constituent cells and efferent 

order to gather information such as the density of rods and pathways of the retina is important for understanding how 

cones it is necessary to flatten the retina. It is desirable to the visual world is sampled and how it is represented in 

project these measurements back onto the original spherical the central nervous system. The retinal whole mount is 

form of the retina, interpolate the sampled data, and the histological method of choice for revealing these topo- 

display the results. This paper is a summary of techniques graphical relationships (Stone, 1981, for review). Because 

which we have developed to perform these tasks. the retina covers the major part of the sphere, it must 

be cut so that it can be flattened for viewing under a 
microscope. Thus, a general problem with whole mounts 
RESUME is that spatial relationships are lost across the cut edges. 

Furthermore, locations of features on the retinal sphere, 
La retine peut etre approximee par une surface which in theory could be specified with great precision, are 

spherique. Pour recueillir certaines informations comme not readily determined from their positions in the flattened 

la densite des cones et des batonnets, il faut aplatir la tissue. 

retine. Nous aimer ions reprojeter ces mesures sur la surface These problems may be solved by reconstructing the 

spherique originate, interpoler les donnees recueillies et original spherical surface from the flattened tissue. Such a 
afficher les resultats. Cet ouvrage est une synthase des reconstruction has been accomplished manually by approx- 

techniques que nous avons developpees pour effectuer ces imating tracings of the tissue to the surface of a sphere of 

taches. appropriate diameter ( e.g ., 0sterberg, 1935). 

The advent of sophisticated and affordable computer 
KEYWORDS: human retina, recontruction, spherical technology has made digital reconstruction techniques pos- 
geometry, interpolation. sible. We report methods for specifying a retinal coordinate 

system, reconstituting the retinal sphere from a three-piece 
whole mount (Curcio, et ol., in preparation), and displaying 
topographic data. 

Our key reconstruction step relies on the fact that one 
of the three pieces of the retina has a particularly easy 
mapping back to the sphere, based on natural landmarks. 
Once this piece has been placed on the sphere, the other 
two are positioned relative to it, using a small set of fiducial 
points. In the first case, we can assume that there has been 
little or no distortion of the tissue. For the second place¬ 
ment problem, we cannot make this assumption. Instead, 
we assume that the tissue has been warped, and rely on an 
This work wZ supported in part by the National Science Foun- iterative relaxation procedure to place each point, 
dation under giant number DCR - 8505713, by a Lions Northwest The reconstructed retina is then used as the basis 

Training Fellowship, and by the National Institutes of Health under f or display. We construct a triangular mesh connecting 
grant number EY04536. 
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the sampled points. Measured quantities (rod, cone and 
gangion cell densities) are represented as intensities (usually 
false-colored). The triangular mesh can be directly dis¬ 
played to give a direct three-dimensional view of the retina, 
or projected onto the plane in the style of conventional 
visual field maps. To generate a true visual field map, we 
can back-project the retina through a standard model of 
the eye’s optics. 

The display techniques have been applied to 0ster- 
berg’s data (1935), and have provided invaluable guidance 
in the design of the sampling scheme which we are using 
now. 

2. Coordinate Systems 

2.1 Retinal Sphere 

The retina is treated as the surface of a unit sphere. 
Any point on the sphere can be indexed by two coordinates, 
A (longitude, meridian) and <f> (colatitude, eccentricity). 
This coordinate system allows us to make comparisons 
between eyes of different diameters. 

Eccentricity is measured from the center of the fovea. 
The nasal horizontal (0°) meridian is defined as the merid¬ 
ian passing through the center of the fovea and the center 
of the optic disk. The superior vertical meridian is at 90°, 
temporal horizontal meridian is at 180°, and the inferior 
vertical meridian is at 270°. 

2.2 Microscope stage 

We developed a 3-piece whole mount dissection tech¬ 
nique which results in a belt 60° wide roughly centered on 
the horizontal meridian, and two caps from the inferior and 
superior retina. These three pieces can be flattened without 
tearing the retina; the belt is only very slightly distorted. 

The locations of data points on the whole mount are 
expressed in terms of the X,Y coordinates of the microscope 
vernier. The raw data base consists of a collection of 
such X,Y points, along with measurements made at these 
locations, such as the density of rods or cones. 

In order to assist in the reconstruction, we also note the 
positions of the fovea, the optic disc, and several (about ten) 
key points (usually blood vessels) along each of the shared 
boundaries between the belt and the caps. 

2.3 Visual field 

It is important to be able to express retinal location in 
terms of functionally defined locations in the visual field. 
The projection of the visual field onto the retina has been 
deduced by tracing the path of rays through the optical 
apparatus of an average eye. The projection is nonlinear, 
such that a degree of visual angle subtends a greater extent 
of retina centrally than peripherally. The exact nature 


of the nonlinearity varies among different schematic eyes, 
based on different underlying assumptions (c.y., Drasdo 
and Fowler, 1974). One advantage of our decision to 
keep all retina data in a retina-based coordinate system is 
that it remains available for transformation to visual field 
coordinates by any suitable schematic eye. 

3o Mappings 

3.1 Three Planar Patches One 

The three separate planar patches (Belt, Inferior Cap 
and Superior Cap) are positioned in a common, planar 
coordinate system so that (see Figure 1): 

© the fovea is at the origin 

© the optic disc is on the positive x axis 

© the inferior and superior caps are attached to, and 

tangent to, the belt, at a common key point. 

3.2 Plane =*► Sphere 

Mapping the central belt to the sphere is relatively easy. 
The shape and extent of this patch was chosen so that it 
could be flattened easily, without appreciable distortion. 
The fovea and optic disc provide all the landmarks nec¬ 
essary to orient the planar stage coordinates with respect 
to our spherical coordinate system. All that needs to be 
done is to wrap the rectangular belt around the sphere. 
First, we estimate the diameter of the retinal sphere by 
measuring the outer diameter of the eye, and the thickness 
of the sclera. The ( x , y) coordinates can then be scaled from 
mm (as measured by the microscope vernier) to degrees of 
arclength (as measured on the unit sphere). They can then 
be considered to be two sides of a right spherical triangle. 
The (A, <t >) coordinates are straightforward to calculate from 
this triangle. 

This calculation correctly maps all points in the belt 
(including the belt’s version of the keypoints) onto the 
sphere. However, it provides only a very gross estimate 
of the position of a point in either of the caps. 

In order to position a cap point, we use an iterative 
relaxation process. In the planar stage coordinates, we 
calculate the distance between the cap point and each of 
the key points in that cap. After mapping to the sphere, 
we discard the key points associated with the cap, and look 
at the corresponding key points in the belt. These points 
have been correctly positioned on the sphere. We measure 
the (great circle) directions and distances between the cap 
point and each key point. Of course, the distances will 
be different than those measured in the plane. By taking 
the vector sum of these differences, and moving the cap 
point (on the sphere) until this sum approaches zero, we 
find a point on the sphere which minimizes our placement 
error. The process is observed to converge in some 10-30 
iterations. 
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It is very important to properly weight the effect of 
individual key points. On the one hand, each key point 
provides some information about the proper placement of 
a particular point. On the other hand, key points which 
are very far away are less reliable, primarily because the 
warping is likely to be very non-linear over large distances. 

Giving each key point equal weight results in obvious, 
gross errors. Weighting each key point by the inverse of the 
(planar) distance to the point to be placed is significantly 
better, but still produces an occasional misplacement. In¬ 
verse square distance is currently in use, and appears to 
appropriately balance the contribution of each key point to 
the final placement. 

After this mapping, we have the original data points 
positioned in our canonical spherical coordinate system. 
See Figure 2. 

4. Triangulation 

Effective display of this data requires more than isolated 
data points. We would like to fit a surface to these points, 
and use that surface to provide estimates of the measured 
quantities everywhere on the retina. A first step in this 
direction is to tesselate the sphere with triangular patches, 
using the data points as vertices. 

The data points are connected into a triangular net by 
projecting them back into the plane and computing local 
equiangular triangulation of the projected points, which is 
the completion of the Delaunay tesselation (Sibson, 1978). 
Given the relatively small size of our data (typically 200 
points), we directly calculate the equiangular triangulation. 

The projection to the plane used is the equidistant 
(polar azimuthal equidistant) projection (Frisen, 1970) - 
achieved by treating the spherical (A, <j>) coordinates as po- 
lar coordinates. This projection preserves radial distances 
(eccentricity), while stretching tangential distances. The 
surface of the sphere maps into a disc of radius 7r. The 
worst distortion is at the opposite pole, which maps to the 
circle surrounding this disc. 

In the plane, the Delaunay triangulation is optimal in 
the sense that the triangles are as compact as possible. 
When projected back to the sphere, this triangulation is 
very good near the fovea, and less so near the periphery. 


5. Display 

The last problem is to display this spherical triangu¬ 
lar mesh, with measured density values at each vertex. 
The customary method of presenting visual field data is 
to project to the plane (using the equidistant projection) 
and plot the density values as gray levels (or isodensity 
contours). It is easy to augment this display with gamma 
correction and false coloring (Sloan and Brown, 1979). We 
have a choice of painting each triangle with an average in¬ 
tensity value, or of interpolating the values measured at the 
vertices (see Figures 3-7). In either case, the triangulation 
developed above is exactly right for this projection. 

This display is similar to those that anatomists and 
ophthamologists are used to seeing, and have little trouble 
interpreting. We can also, of course, display the triangular 
mesh (colored as above) as a three dimensional, spherical 
surface. Going further, we can use the measured densities 
to deform the surface away from the sphere. Finally, we 
are beginning work on fitting a smooth surface to the data, 
using our triangular mesh as a control graph. The methods 
of (Farin, 1983) can be applied directly to our data. 

6. Discussion 

0sterberg’s paper (1935) on the distribution of rods 
and cones in the human eye is one of the most widely 
cited studies in the vision literature. The findings most 
frequently cited and reprinted are the density of rods and 
cones along the horizontal meridian and the calculation of 
the total number of photoreceptors. His data on the overall 
topography of photoreceptors, illustrated by density maps 
(contours for rods, symbols for cones) are less well appreci¬ 
ated, mainly because of the relatively less accessible nature 
of his maps. Our display techniques, applied to this data, 
provide much better intuition about the gross topography, 
and have pointed out deficiencies in his sampling scheme. 

For example, look at Figure 6, which shows the central 
8° of 0sterberg’s eye. Notice that he sampled very finely 
along the 0° meridian, but much more coarsely in other 
directions. When this sampling is displayed directly, it gives 
a distorted picture of the shape of the density map. If one 
assumes that the map is radially symmetric, then a better¬ 
looking density map could be constructed (by duplicating 
the 0° points), but this “better-looking” picture would be 
more a product of the assumption than of the measured 
data. By comparison, look at Figure 7, which shows the 
same central 8° of a more recent eye. 
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Figure 1. the three-piece dissection. The belt is roughly 
rectangular, and can be positioned on the sphere using the 
fovea and optic disc as landmarks. The inferior (superior) 
cap is positioned relative to the belt by means of key points 
which can be located in both the belt and the cap. 
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Figure 2. Sample points from a typical reconstruction, 
displayed using the polar azimuthal equidistant projection. 


Figure 3. A display of 0sterberg’s rod density data, 
smooth shaded (in the plane) and false colored. 



Figure 4. A rod map from a different eye, with the 
triangles shaded according to the average density at each 
vertex. 


Figure 5. The same rod map as in Figure 4, displayed 
smooth shaded. 



Figure 6. The central 8° of 0sterberg’s cone density 
data. Note the anisotropic sampling, and the “kite-shaped” 
pattern. 



Figure 7. The central 8° of one of our recent eyes. 
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ABSTRACT 

A method for isolating and recognizing 
characters in typeset text containing 
touching or overlapping characters is 
described. The method relies on vector 
quantization techniques to represent the 
text by an ordered sequence of codes. 
Characters or character fragments are 
extracted from this code using a form of 
string matching and network searches. 
Compatibility relationships are applied 
to these fragments to obtain a list of 
possible input strings. Our simulations 
on the Symbolics 3640 workstation do not 
indicate that this method would replace 
conventional template matching schemes. 

KEYWORDS: optical character recognition, 
template matching, vector quantization, 
consistent labeling problem. 

INTRODUCTION 

With the continuous drop in the price of 
computer memory and VLSI components, it 
is easy to envision new and more powerful 
OCR devices becoming available. Such 
devices would utilize several different 
recognition strategies depending upon the 
quality of the input; they would be 
capable of learning a new font without 
human supervision; and they would contain 
large spelling dictionaries for applying 
contextual postprocessing [1-4]. Software 
development costs rather than the cost of 
hardware components would be the limiting 
factor to building such a system. 

Automatic reading machines can process 
good quality typescript input at error 
rates approaching those of the average 
typist, however, only a few machines can 
handle typeset material containing 
proportionally-spaced characters [5]. A 
recurrent problem is the presence of 
touching characters which invariably 
results in segmentation and 
misclassification errors (eg. rn confused 
with m). Without gaps separating 
characters, the character extraction and 
classification process can no longer be 
performed independently, and this 
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increases the complexity of the 
recognition problem considerably. 

There have been several studies [6-8] on 
the segmentation problem of touching 
characters. Some excellent results have 
been obtained using a combination of 
dynamic programming and template matching 

[8]. A new and promising approach uses 
word shape matching [9]. Our interest in 
this problem has been motivated primarily 
by its similarity to the connected speech 
recognition problem. 

In fluent spoken speech, there are no 
silence gaps between words [10]. The 
computational requirements for segmenting 
speech into words presently precludes the 
operation of any real time speech 
recognizer on a vocabulary of 5000 words 
[ 11 ]. 

In both speech recognition and optical 
character recognition, template matching 
techniques have been the preferred 
approach in commercial devices [12 and 
13]. Though it is recognized that feature 
extraction techniques are more robust in 
handling multifonts, template matching 
techniques are fast and easy to implement 
and can handle poor quality input 
containing noise, voids and touching 
characters [13]. However, template 
matching schemes have little tolerance to 
minor type-face variations, of which 
there are more than 3000 in common usage 
[5]. Some of the newer schemes [14] 
attempt to apply combinations of these 
two techniques and hopefully reap the 
benefits of both methods. 

Preprocessing techniques in speech 
recognition rely on data compression 
techniques (eg. linear predictive coding) 
in order to extract the essential 
information from the input data. Some of 
the more recent recognition methods also 
apply the rediscovered vector 
quantization coding schemes [15 and 16] 
to reduce the data to an input stream of 
symbols. The Hidden Markov Model [17-21] 


Vision Interface ’86 



391 


is then used to interpret this stream of 
symbols. Though the accuracy of the 
vector-quantization based algorithms is 
lower than that of the conventional 
dynamic time warping techniques, the 
method requires an order of magnitude 
less processing time [20]. A further 
advantage of the Hidden Markov Model is 
that it avoids the need to explicitly 
code subjective rules about the 
interpretation of the patterns [22]. 

In our investigation, we exploited the 
vector quantization coder to extract the 
structural features of the characters. 
String matching techniques were then used 
to extract possible characters or 
character fragments from the text and 
network constraints were used to 
eliminate some of the interpretations. 

Our study has so far been limited to the 
computer-generated fonts on a Symbolics 
Model/3640 workstation. This approach is 
effective for prototyping our design 
since it expedites the training process 
and factors out other processes in an OCR 
such as skew detection, correction for 
uneven illumination and distortions 
introduced by the optics system. Our 
simulations have provided us with useful 
information. 


PREPROCESSING AND DATA REDUCTION 

In the first step the image 
representation of a line of text is 
converted into an ordered list of 
discrete symbols to permit the 
application of one of many contextual 
pattern recognition schemes [1 and 23], 
For example, the stream of symbols could 
be viewed as a grammar containing 
sufficient information to separate and 
identify the characters using syntactical 
pattern recognition schemes [24]. 

A short character string was displayed in 
a window of the Symbolics workstation 
using the TIMESROMAN12 font (one of a 
hundred fonts available in this 
workstation). Character spacing was 
adjusted so that adjacent characters 
either touched or overlapped in the 
horizontal direction. A low pass spatial 
gaussian filter [14] was applied to this 
window to simulate the optics of an OCR 
scanner and the resulting image was 
resampled to a lower resolution along a 
regular rectangular grid. The resampled 
image was thresholded to form a binary 
representation. 


the characters in the TIMESROMAN12 font, 
the sample column vectors were grouped 
into 79 categories. The distance measure 
was based on the number of runs of black 
samples (ones), their position and their 
length. The categories were chosen so as 
to ensure that no particular training 
vector was further than a certain 
distance from the category template. 

Using this library of category templates 
which we call the prototype list, the 
sample vectors in the text window were 
converted into a list of prototype codes 
which referenced the position of the 
closest matching prototype in the 
prototype list. This encoding scheme is 
similar to the vector quantization 
compression scheme applied to speech. 
The text data can be reconstructed from 
the code list with a nominal amount of 
distortion (Figure 1)-. 



Figure 1. The characters in the 
TIMESROMAN12 font (39 points high 
including descenders) were smoothed and 
resampled with a unit variance two 
dimensional Gaussian filter and 
thresholded at a level of 0.3. The 
resampling density was chosen to yield 16 
points per column. The sample column 
vectors were quantized to one of 79 
producing a code-list as shown for the 
letter t. 

SEGMENTATION AND CHARACTER RECOGNITION 

Given the stream of prototype codes from 
a text window, the labeling of the codes 
as elements of characters is a form of 
the consistent labeling problem [25]. 
Each column vector may be a part of many 
possible characters however local 
contextual constraints limit the 
solution. The consistent labeling 
problem is an NP-problem and includes the 
graph homomorphism problem, the graph 
colouring problem, the packing problem, 
the scene labeling problem, the shape 
matching problem, the constraint 
satisfaction problem and theorem proving 
[25]. For this reason, the Symbolics 
work station was considered as a suitable 
tool for performing this study. 


The sample column vectors extracted from 
this grid consist of sequences of ones 
and zeros which were then run length 
encoded. Using an appropriate distance 
measure and a training set of 5074 sample 
column vectors which were derived from 


A related problem is the restoration of 
spaces between words in the following 
text string 

'heworkswithcomputers'. 

Using a large spelling dictionary, the 
computer can find nine possible words 
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embedded in the string: 

computer he or with works 

computers it put work. 

However, assuming that every letter must 
belong to exactly one word in the 
dictionary, the computer will arrive at 
the correct solution 

'he works with computers'. 

The dictionary search is the most 
computationally intensive component of 
this problem. Special string comparison 
VLSIs [26] allow rapid solution of this 
problem on a PC with a large spelling 
dictionary. 

In the OCR problem, two additional levels 
of complexity are introduced. Since the 
position of the scanning grid varies 
randomly with respect to any character 
[27], the same character may have many 
possible prototype code sequences (Figure 
2). Other factors increasing this 
variability are the evenness of the 
lighting and the ink absorbing 
characteristics of the paper. Secondly, 
when adjacent characters are in contact, 
the initial or final sequence of codes of 
a character may be distorted or lost. 
The OCR must rely on the information in 
the central portion of the character. 
For some of the narrow characters (such 
as 1,1,r,t and the punctuation marks), 
this poses a major difficulty. 



Figure 2. The prototype code (a number 
between 1 and 79) versus horizontal 
sampling position is shown for the upper 
case character L. The vertical dashed 
lines indicate sampling positions for a 
sampling rate equal to the vertical 
sampling rate. Note some prototype codes 
represented by the two dots in the plot 
have a very narrow range of existence. 
Such codes are easily missed during the 
training process. 

To allow for this variability, the string 
matching procedure was implemented as a 
network search. A cross reference table 
was associated with every prototype code. 
This table lists all the characters 'hat 
reference this code and the relativ-' 
position in the character. In the 
previous example, the cross reference 
table associated with the letter o would 
include the following possible 
interpretations: 

(or 1) (work 2) (works 2) (computer 2) 
(computer 2) (program 3) ... 

The number associated with each word 


indicates the position of the character o 
in the word. 

Given an unknown string, the algorithm 
attempts to create an interpretation list 
which satisfies the following constraint 
where c^ represents the letter at column 
i and (wj_ s ^ ) is one of the 
interpretations corresponding to c^. Two 
interpretations (w^ s^) and (w^+i Sj_+i) 
corresponding to c-j and c-^ + i are 
compatible if one of the following 
conditions are satisfied. 

1 w i+1 = w ± and s^ +1 = Sj_ 

or 

2 + i = w ± and s j_ is the last 
character position w^ and s^+i 

is the first character position of Wj_ + ^. 

In our OCR application the two conditions 
were relaxed to allow the additional 
variability. 


IMPLEMENTATION AND RESULTS 

A cross reference table was created by 
resampling and vector quantizing the 
characters at a high sampling rate. In 
order to limit the table to a workable 
size, the positional information, 
s^, was stored at lower ^resolution. On 
average, there were 27 interpretations 
associated with each of the 79 prototype 
codes. 

The recognition software was divided into 
two sections. In the first pass, 
character fragments were extracted from 
the code list by searching for the longer 
chains of compatible interpretations. In 
the next pass, the algorithm attempted to 
patch the character fragments together in 
order to obtain a consistent 
interpretation. Frequently, the algorithm 
would return several consistent 
interpretations; they were ranked by the 
average character visibility in the text 
window. 

The algorithm was implemented in compiled 
LISP on our Symbolics 3640 workstation 
with 512K words memory and without a 
floating point accelerator. A string of 7 
touching characters was processed in 
approximately one minute. Eighty percent 
of this time was spent preprocessing the 
data and vector quantizing the sampled 
column vectors. The use of a VLSI 
pattern matching chip for performing the 
vector quantization [28] could reduce 
this time. 

The algorithm had almost no difficulty 
identifying the isolated characters. 
However, it was necessary to use the 
maximum horizontal resolution (double the 
vertical resolution). The algorithm did 
not always distinguish some of the 
punctuation marks (:;) and confused the T 
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THAT 

THAT THAT THAT THAT 

that 

that thar 

WITH 

WUHWIIH 

with 

with wirh 

m 

mm 

this 

this 

FROM 

FROM PROM FRCM FRCM 

from 

from fan fan fan 

HAVE 

HAVE 

have 

have 

THEY 

THE* THE THEf THE 

they 

they 

WHICH 

WHICH 

which 

which which whinh 

WERE 

WERE WERE 

were 

were wete wert; wen; 

THERE 

THERE THERE THBRE THBRE 

there 

thereto 

WHEN 

WHEN WHEN 

when 

when 

we 

WILL WE 

will 

will 

MORE 

MORE MORE 

more 

more mote mere more 

SAID 

SAID SAID 

said 

said said said 

WHAT 

WHAT WHAT 

what 

what 

ABOUT 

ABOUT ABOUT ABOUT ABOUT 

about 

abcut about about 

ONLY 

OMYONIY 

only 

only cnly 

OTHER 

OTHER OTHER OTHER OTHER 

other 

other orher erher 

SOME 

SOME SCME 80ME 8CME 

some 

some seme some sotne 


Figure 3. Results of applying the recognition algorithm on some common 4 and 5 letter 
English words displayed in TIMES ROM AN 12 font in both upper case and lower case 
characters. The second and fourth columns contain the list of possible character strings 
which match the input in decreasing order of average character visibility. 


with the three character sequence 'I'. 

The algorithm was next tested on a set of 
the twenty most common 4 and 5 letter 
words which were displayed in both upper 
and lower case touching characters. The 
algorithm usually returned several 
consistent interpretations (eg. more, 
mote, mere, more), however the first 
choice was almost always the correct 
choice (Fig. 3). Occasionally, the 
algorithm failed to recognize a 
particular letter (eg. Y in THEY). The 
failure was traced to a missing entry in 
the cross reference table which resulted 
in a broken chain of interpretations for 
this letter. 

The next test consisted in applying the 
OCR to the TIMESR0MAN15 font. Though the 


resampling process automatically 
normalizes the character dimensions, the 
smoothing filter was fixed and resulted 
in some small differences for the larger 
font. The OCR initially failed to 
recognize the new characters in this 
font, but when the cross reference table 
was extended to include these characters 
the algorithm was able to handle both 
fonts. 


DISCUSSION OF RESULTS 

The purpose of this study was to examine 
the feasibility of using string matching 
techniques to isolate and recognize 
proportionally-spaced characters which 
touched or overlapped. We represented the 
unknown string of characters by a 
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sequence of equally spaced sample column 
vectors which were extracted from the 
text. The column vectors were 
categorized into about eighty types and 
their codes were sent to the classifier 
which applied string matching techniques 
to extract and identify the characters. 

Our simulations so far were restricted to 
a computer generated font, TIMESR0MAN12 
on a Symbolics workstation. The 
algorithm succeeded in identifying most 
of the characters but encountered 
difficulties distinguishing some pairs 
such as (r,t), (o,c), (1,1) and (T,'l') 
when the beginning or end of the 
characters were obscured by neighbouring 
characters. 

Our preliminary investigations do not 
demonstrate any clear advantage of the 
string matching approach over the 
conventional template matching schemes 
(eg. [8]). Despite the use of a cross 
reference table to accelerate the 
matching process, the recognition process 
was still complex. A sequence of 
filtering processes were required to 
remove many of the character fragments 
which tended to impede the recognizer. 

The accuracy measurements indicate that a 
higher vertical sampling rate and a 
larger prototype list would be needed to 
preserve the details of the characters. 
Furthermore, in order to handle italics 
and Greek mathematical symbols, the 
prototype list would need to double in 
size. Increasing the prototype list 
would increase the cross reference table, 
the network seach time and the 
sensitivity of the recognizer to any 
image noise. 
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ABSTRACT 


In images such as of contour maps, fingerprints, 
and electric fields, regions of contour lines can be 
distinguished, and these regions are often used for 
image understanding. In this work, such images are 
collectively termed contour line images. The 
objectives are to determine the properties by which 
contour line regions are characterized, and to 
develop an approach using these properties to 
automatically determine regions. An algorithm is 
proposed to group lines into regions. This is based 
in part on the parallel-adjacency criterion which is 
defined here. The algorithm has been applied to 
several contour line images, and the resultant 
regions are shown. 

KEYWORDS: contour line regions, image 

segmentation, pattern recognition 

1. Introduction 

It is a simple matter for a person to locate a knot in 
wood grain, or to recognize a pattern in marble. 
Patterns on contour maps can be recognized as 
locations of rivers or steep grades. Other constant- 
value plots such as flux line fields in electromagnetics 
and isobar maps in meteorology are interpreted by the 
pattern of lines (rather than by individual lines) to 
understand global information in the image. 
Differences in the shape of contour patterns in 
fingerprint images can be used, even by non-experts, 
to distinguish different prints. All these examples 
involve images of contour lines in which regions of 
line patterns are discerned by human viewers, and 
from which interpretation or recognition is performed. 
Although human recognition of these patterns seems 
trivial, it is no trivial task for a computer. In this 
paper, we examine properties which characterize 
regions in contour line images, and propose an 
approach for performing automatic determination of 
these regions. 

We wish to deal with contour line images, 
irrespective of what the contour lines represent. The 
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expression contour line image will refer to an image 
made up of lines (where lines can be curved) which 
run approximately parallel to adjacent lines, at least 
over short distances. The spacing between adjacent 
lines is often small, and the lines may be densely 
packed. Therefore, groups of lines appear as regions 
rather than boundaries. Examine Figure 1, and 

determine the number of contour line regions in each 
diagram based on the “similarity of pattern” within 
each region. Our experience shows that there is a 
general consensus in the number of regions perceived 
for each. It is this response that we desire to emulate 
by machine. (Our informal examination of the human 
perception of these regions is inadequate to claim that 
this is a universal response, however this does not 
lessen the interest in proposing a method for 

emulating what we have found to be a common 

response.) The common responses given for the 

number of regions in each diagram of Figure 1 are 
shown in Figure 2. 


2. Method 

Our objectives are to determine the properties which 
characterize contour line regions, and to describe an 
approach which utilizes these properties in order to 
segment regions. It is necessary to clarify some 
terminology before describing the properties and 
approach. A contour line , or line . is a straight or 
curved sequence of contiguous points between two 
endpoints. A segment is a straight line fit to a portion 
of a line. We will refer to regions of lines, and groups 
of segments. 

First, we combine descriptions of some of the 
properties of contour line images into the expression 
parallel-adjacency . The parallel-adjacency criterion 
specifies that if a pair of segments are: 

1. adjacent (i.e. not separated by other lines), 

2. close in distance, 

3. approximately parallel, and 

4. overlap by a specified amount. 
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then the lines to which the segments belong are 
potentially in the same region. In the algorithm to be 
described below, pairs of segments are first compared 
for parallel-adjacency, and groups of line segments are 
built by these pairwise comparisons. Then, in the 
same manner, adjacent lines are compared pairwise to 
determine if they are similarly comprised of segments 
of the same parallel-adjacency groups. In this way, 
lines are combined into contour line regions. 

The main steps of the algorithm are listed below: 

1. Split lines at ail junctions (bifurcations and 
crosses in lines). 

2. Perform piecewise straight-line fitting so that 
each line is comprised of straight line segments. 

3. Construct an adjacency list of the straight-line 
segments. This segment adjacency list (SAL) 
contains, for each segment, all other segments 
meeting criteria for distance proximity, 
approximate parallelism, and non-zero overlap 
with respect to that segment. 

4. Merge the segments of the SAL into groups on 
the basis of pairwise similarity of line segments 
due to the parallel-adjacency criterion. The 
result is the segment group list (SGL). 

5. Consider each line in its entirety (made up of the 
straight line segments), and group the lines, 
again in pairwise fashion, based on line 
adjacency and similar composition of line 
segments from the SGL. The result is the line 
region list (LRL) containing line composition of 
each contour line region. 

3. Results 

The algorithm was applied to each of the diagrams in 
Figure 1. The results are shown in Figure 2. For these 
synthesized images, which contain no noise, the 

regions found by the algorithm are consistent with our 
expectation and approach. Figure 3 shows the results 
of the algorithm as applied to a thinned fingerprint 
image which contains broken, short, and isolated lines. 
The 3 largest regions (in terms of number of lines per 
region) which are found by the algorithm are shown. 
We are currently working to better establish the 
relationships of the algorithm parameters to the image 
characteristics — especially for images containing 
noise. 


4. Summary 

The objectives addressed in this work are to determine 
properties which characterize contour line regions, 
and to automatically distinguish those regions. In the 
context of this work, contour line images consist of a 
large number of lines, where adjacent lines are closely 
spaced, overlap, and are approximately parallel. From 
these properties, the parallel-adjacency criterion is 
defined and used to associate piecewise segments of 
different lines into groups. Contour line regions are 
then found by pairwise merging of lines which are 
similarly comprised of groups of segments. 
Experiments have shown that the regions determined 
by the algorithm as applied to both synthetic and real 
images are consistent for our approach. 

A short description of the method has been given 
in this extended abstract. Continuing work will give a 
better understanding of the performance of the 
algorithm for different conditions and types of images, 
and a more detailed description will be given in a 
future paper. 
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Figure 2. Contour line regions found by the algorithm for the images in 
Figure 1. Number of regions: (a) 2. (b) 2. (c) 1. (d) 2, (e) 1. (f) 2. 
(g) 2. (h) 3. . 
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Figure 3. Fingerprint image, and the regions found. 


Graphics Interface ’86 


Vision Interface ’86 




401 


AUTHOR INDEX / REPERTOIRE DES AUTEURS 


Adjouadi, M. 

.307 

Archibald, C.C. 

.293 

Armstrong, W.W. 

. 147 

Badler, N.l. 

. 115 

Barr, J.M. 

. 331 

Barsky, B.A. 

. 241 

Booth, K.S. 

. . . . 82, 91, 194 

Bouthemy, P. 

. 350 

Brault, J.-J. 

. 375 

Brzakovic, D. 

. 366 

Cachola, D.G. 

. 152 

Caelli, T.M. 

. 343,356 

Calvert, T.W. 

. 121,300 

Carroll, J.M. 

. 186 

Coggins, J.M. 

. 229 

Cruchant, H. 

. 284 

Curcio, C.A. 

.385 

Davis, W.A. 

. 235,287 

DeRose, T.D; . 

.241 

Drewery, K. 

. 131 

ElMaraghy, H.A. 

. 15 

Fay. F.S. 

.229 

Ferch, H.J. 

. 11 

Fogarty, K.E. 

.229 

Forest, L. 

.213 

Forsey, D.R. 

. 194 

Fournier, A. 

. 49, 164 

Franklin, W.R. 

.26 

Fuchs, H. 

. 193 

Gardner, B.R. 

. 20 

Glass, G.J. 

. 180 

Goldberg, M. 

.273 

Goodenough, D.G. .. 

. 266,273 

Grant, E. 

. 104 

Green, Marc . 

. 337 

Green, Mark . 

.71, 147 

Greene, N. 

. 108 

Grindal, D.A. 

. 164 

Gross, J.R. 

.. 241 

Hanrahan, P. 

. 56 

Heckbert, P.S. 

.207 

Hewitt, S. 

. 121 

Higgins, T.M. 

.82 

Hill, D. 

. 136 

Holynski, M. 

.20 

Hong, W. 

.260 

Hoskins, J.A. 

. 7 

Hoskins, W.D. 

. 7 

Hutber, D. 

.361 

Hwang, C.H. 

.. 287 

Jernigan, M.E. 

.325 

Kamal, M.R. 

.380 


Kenk, E.279 

Kubota, K.390 

Kurz, R.380 

Lake, R. 147 

Lefevre-Fonollosa, M.-J.284 

Levine, M.D. 260, 380 

Lewis, J.P.173 

Lewis, J.W.32 

Liakopoulos, A.366 

Liang, P.313 

MacKay, S.A.98 

Magnenat-Thalmann, N.213 

Malowany, A.S.380 

Mansouri, A.-R.380 

Meyers, D.385 

Myers, B.A.62 

Nagendran, S.356 

Nemoto, K.43 

Nguyen, P.T.267 

Nichols, M.26 

Nisselson, J. 1 

Olsen, Jr., D.R.66 

Omachi, T.43 

Oppenheimer, P.254 

Ostrovsky, R.20 

O'Gorman, L.396 

Paeth, A.W.77, 91, 194 

Peachey, D.R.37 

Pearce, A.136, 217 

Pellicano, P.N.E.370 

Pentland, A.P.223 

Plamondon, R.375 

Plunkett, G.W.273 

Pointing, T. 188 

Prusinkiewicz, P. 158,247 

Rambaud, D.213 

Ridsdale, G.121 

Samaddar, S.26 

Schlag, J.F.202 

Schoeler, P.49 

Schrack, G.F. 152 

Shlien, S.390 

Sims, P.361 

Singh, G.71 

Sloan, Jr., K.R.385 

Sondheim, M.279 

Sternberg, S.R.293 

Streibel, D. 158 

Tanimoto, S.L. 349 

Tanner, P.P.98 

Thalmann, D. .213 

Thornton, R.W.180 


Graphics Interface ’86 


Vision Interface ’86 











































































































402 


Todhunter, J.S.313 

Tsotsos, J.131 

Tuori, M. 188 

Turpin, D.279 

Walford, A.E.J.325 

Walters, D.318 

Wang, X.235 

Weil, G.l.396 


Whitted, T. 

. 104 

Wilhelms, J. 

. 141 

Wu, P. 

.26 

Wyvill, B. 

. 136,217 

Wyvill, G. 

. 136,217 

Xie, S. 

.300 

Yee, B. 

.279 



Graphics Interface ’86 


Vision Interface ’86 


















